Foreword: Alan Simpson's tribute to his co-author
My co-author and friend Professor A. R. Collar, C.B.E., LL.D., F.R.S., Emeritus Professor of Aeronautical Engineering in the University of Bristol, distinguished aeronautical scientist and former Vice-Chancellor of that University, died at his home in Bristol on 12th February 1986, just ten days before his 78th birthday, and only a short time before our book was published.
Roderick's eminence as an engineering scientist and scholar is well known.
However, he was also a man of exceptional personal warmth and charm.
He was, in every sense, a gentleman and a gentle man.
Nevertheless, when the occasion demanded it (as it did during the turbulent period in which he became Vice-Chancellor of Bristol University), he showed great courage and firmness  the latter always in the nicest possible way.
Roderick's interests were wide; for example sport (cricket, tennis and football  both varieties), good music (he played the violin), verse (he read extensively and composed, when inspired or provoked, to match an occasion and could recite extensively from memory  particularly W. S. Gilbert), games and puzzles (chess, cribbage, mathematical conundrums and crosswords; he could, at his zenith, do The Times Crossword any day in less than twenty minutes), and art (he was an active member of The Bristol Savages).
He was also president of a local Gilbert and Sullivan society for many years.
He did much hospital work: for several years up to his retirement he had been Chairman of the South West Regional Hospital Management Board and was an active hospital visitor.
His charitable works were extensive.
He had been President of the Bristol Rotary Club, chairman of governors of many schools and colleges (notably R.M.C.S., Shrivenham), and was a Liveryman of the Guild of Air Pilots and Navigators, to mention but a few of his activities.
He was the perfect committee chairman: he loved committee work when it was productive, and it always was when he was in the chair.
Innate within him was the ability to put committee members at their ease, so that they could give of their best, and he could abstract the maximum amount of information.
This ability was evident in his many years of A.R.C.
Committee and Council chairmanship and in his chairmanship of the university committees, culminating in his Vice-Chancellorship.
At one time he was on 25 committees, over and above those within the University, and was chairman of many of these.
At the height of this committee and other administrative work, Roderick often expressed his desire to return to his mathematics and, in particular, to write a sequel to his famous book (with R.A. Frazer and W.J. Duncan) Elementary Matrices.
I had agreed to be co-author, but it was not until seven years after his retirement that work could begin, for only then had his committee activities abated to an extent which allowed the long periods necessary for writing and discussion.
It is appropriate to recall that Elementary Matrices (1938) was printed nine times in U.K., several times in U.S.A., was translated into Russian and Czech, and finally republished in hardback in U.S.A. in 1983.
It was written following a period of pioneering work on the application of matrices to engineering problems, undertaken by Frazer, Duncan and Collar at the N.P.L., Teddington, in the 1930s.
Collar was certainly the junior member of the partnership, but those who know his mathematical style are well able to perceive his many areas of contribution to the book.
In later years, Roderick confided that he had been far from happy with the title, for the book certainly was not &quot; elementary &quot;.
Any sequel must be &quot; more elementary &quot; and tailored to the needs of the engineering student.
This is not to be viewed as condescending: Roderick regarded himself as a student of the subject and said so on many occasions during the writing of the new book.
As work began in earnest on the book, I realised that Roderick had lost none of the skills that I, as his research student twenty years before, had regarded with awe.
But writing now was not easy for him: he had begun to suffer with arthritis in the year following his retirement, and had had to give up his violin owing to lack of flexibility of his fingers, and as the condition developed, he found writing more and more difficult.
This did not deter him, as, between 1980 and 1984, he completed over 400 pages of manuscript.
On average, we met for three hours every ten days, constantly revising, exchanging and criticising so that in the end we had both had a hand in everything.
The whole must have been rewritten about five times.
Roderick was always seeking a better way of putting things in order to remove all possible ambiguities and to assist the reader to the maximum extent: if this meant rewriting 100 times he would gladly do it.
He was meticulous, but never pedantic.
Roderick's last, and possibly finest, contributions to this book were the Theorems XII and XV of Chapter 1 which were completed in 1985 shortly after the serious fall which perhaps precipitated the leukaemia from which he failed to recover, despite repeated blood transfusions.
At this stage, the arthritis in his fingers was scarcely bearable and writing extremely painful.
But the exceptional clarity of his thought processes was evidently still totally unimpaired.
We last met on the Sunday before his death.
Although very weak, he was anxious to discuss technical matters, among which were possible revisions to Chapter 6.
He had, a few weeks before, completed an excellent set of autobiographical notes and, characteristically, a sonnet to his wife, Bobbie, on her birthday.
Roderick will be sadly missed in every circle in which he moved, but most of all, of course, by Bobbie, whose devotion and fortitude was his greatest comfort in his final illness, and whose forbearance over many years enabled Roderick to spend such a great amount of their valuable time on the book.
A.S., February 1986.
INTRODUCTION
In this text, which is aimed at senior undergraduate students of engineering, the subject of matrices is described and developed before being applied to some of the problems of engineering dynamics.
The matrix theory is presented in classical algebraic form with no recourse to the notions and nomenclature of vector space theory.
In this respect, the book is a sequel to the earlier work Elementary Matrices by Frazer, Duncan and Collar, a book which, by presenting problems in a form which could be assimilated by computers, stimulated the growth of the latter.
In the present book, the subject is brought up to date and related to modern computer methods.
The text is punctuated throughout by numerical examples, most usually based on matrices of small order.
The treatment of engineering dynamics is almost invariably linear, although examples of simple non-linear formulations are provided.
Chapter 1 contains definitions of the basic laws of matrix manipulation and of matrix types.
Matrix calculus is introduced via the Sylvester expansion theorem and considerable emphasis is placed on the matrix eigenvalue problem.
A useful collection of theorems and proofs is presented at the end of the chapter: among these is what is thought to be a novel proof of Sylvester's law of degeneracy.
Chapter 2 is concerned with numerical methods: it divides into two major parts, namely methods of reciprocation, triangulation and solution of linear algebraic equations, and methods for the solution of eigenvalue problems.
Both parts address many alternative procedures, all of which are exemplified  most often by using matrices of order 4.
Chapter 3 concerns neighbour systems, a subject of great importance in engineering dynamics.
The Collar-Jahn method, along with several new variants, is presented.
The difficult problems associated with adjacent or confluent eigenvalues are outlined and methods of solution developed and discussed.
The chapter is thought to be the first on this subject in a text of this type.
Fundamental concepts of dynamical systems are introduced in the simplest of terms, in Chapter 4.
Formulations based on the principle of virtual work are given emphasis: conservative and non-conservative actions are carefully delineated and discussed in the context of discrete systems.
The important notion of semi-rigidity is introduced.
The ideas of Chapter 4 are carried forward in Chapter 5 via a proof of Hamilton's principle and the associated proof of the Lagrange equations.
Conservative and non-conservative actions are identified by the form of the matrices appearing as coefficient arrays in the equations of motion.
A general treatment of the linear conservative dynamical system is given, including calculation of natural frequencies and normal modes, and topics such as removal of zero frequency roots.
An outline of structural damping modelling is also given.
Chapter 6 deals in general terms with the solution of sets of linear differential equations with constant coefficients.
The concept of the matrix Laplace transform is introduced.
In the second part of his chapter, the subject of linear stability theory is discussed, leading from the fundamental encirclement theorem of Cauchy, via the criteria of Leonhard and Routh, to the Poincar-Liapunov method.
An inverse method for the calculation of stability boundaries is also discussed.
Chapter 7 gives an extended treatment of the dynamics problems of continuous linear systems by use of the semi-rigid method in matrix form.
Emphasis is placed on one-dimensional systems, but flat plate problems are also discussed.
Exact formulations based on &quot; dynamic stiffness &quot; or &quot; receptance &quot; are highlighted, and the concept of &quot; finite element &quot; is introduced as a natural extension of the semi-rigid method.
Finally, Chapter 8 deals with the dynamics of composite systems in a manner which owes much to the pioneering work of Kron.
The finite element and dynamic stiffness methods are introduced and exemplified in simple cases.
The importance of the connexion graph is emphasised.
The chapter ends with an extensive discussion of Kron's eigenvalue method followed by proofs of the Wittrick-Williams and Simpson counting algorithms.
The book will be accompanied by a &quot; computational adjunct &quot; in which a number of BASIC programs for use on home computers will be presented.
These programs, which will be available on disc, relate to specific topics covered in the main text and may be used to verify calculations performed therein.
Some of the programs, however, are of the general-purpose type.
Properties of Matrices
1.1 ASSUMPTIONS, CONVENTIONS; TRANSPOSITION AND RELATED DEFINITIONS
In this chapter we shall discuss certain characteristics and properties of matrices which are of particular importance in dynamical studies of systems with multiple degrees of freedom.
We assume the reader to be familiar with the notation of matrices and with the rules of combination: addition, multiplication, etc., and with conformability and non-permutability.
We also assume familiarity with the elementary rules for combination and evaluation of determinants.
In general, matrices are rectangular; but of particular importance are matrices having equal numbers of rows and columns  i.e. square matrices  and those having only a single row or column, which we call vectors; both types are special forms of rectangular matrices.
Normally, we shall use capital bold type, e.g. A, to denote square and rectangular matrices having two or more rows and columns, and lower case bold type, e.g. x, to denote vectors.
If a matrix A has m rows and n columns, i.e. it is of order (m  n), its typical element [formula] lies at the intersection of the ith row and jth column ([formula]).
If we form the matrix [formula] by writing columns for rows in order, the new matrix, of order (n  m) is called the transpose of A and is denoted by [formula].
For example, [formula].
The process is clearly one of reflection in the diagonal containing the terms [formula], which is described as the principal diagonal.
It is of particular importance in square matrices.
The terms [formula] in A form the superdiagonal and [formula] the infradiagonal, while for a square matrix of order (n  n)  or more simply, of order n  the elements [formula], form the secondary diagonal.
An important property of a square matrix is its trace; this is the sum of all the elements in the principal diagonal.
Wherever it saves space, we shall always write a column vector x as [formula] the braces{} denote that this is to be read as a column.
Since in dynamics, we are usually more often concerned with columns than with rows, we shall write a row vector as the transpose of a column vector, e.g. [formula].
1.2 SYMMETRIC AND SKEW-SYMMETRIC MATRICES
If [formula], the matrices, which are necessarily square, are termed symmetric; if [formula], they are said to be skew-symmetric: in this case, since [formula] = [formula], the principal diagonal has zero elements.
A square matrix having zeros everywhere except in the principal diagonal is called a diagonal matrix and is clearly symmetric.
The unit matrix I is a diagonal matrix of unit elements; a matrix having all its elements zero is said to be null and is written 0.
It is possible to express any square matrix as the sum of a symmetric and a skew-symmetric matrix; this is shown by the identity [formula].
The first term is unaltered by transposition; the second changes sign.
1.3 REVERSAL OF ORDER ON TRANSPOSITION
If we have a matrix product A = BC, where B is of order (m  r) and C of order (r  n), then A is of order (m  n).
If [formula], B and C are conformable only in that order.
The typical product element is [formula].
Transpose B and C; Then [formula] are of orders (r  m), (n  r) respectively, and are therefore conformable only in the order [formula].
Moreover [formula], or [formula].
Hence transposition of a matrix product requires reversal of the order of the matrices.
By an obvious extension, if A = BCD, then [formula], and so on.
In view of the formulae (1) and (2) it is clear that these results hold whether or not B, C are conformable both in the order BC and in the order CB, including the case where both are square.
For the unit and null matrices, (1) implies [formula].
1.4.
PARTITIONING OF MATRICES
Any matrix may be partitioned, by drawing dotted lines between adjacent columns and adjacent rows, into submatrices.
We do this, notionally, in any matrix multiplication: if we are performing the operation BC = A then we isolate a row of B and a column of C  respectively (1  n) and (n  1) submatrices  to produce an element  i.e. a (1  1) submatrix of A. Similarly, we could, for example, isolate the first three rows of B and the first four columns of C and multiply these submatrices, of order (3  n) and (n  4), to give a (3  4) submatrix in the top left corner of A. Again, since [formula] it makes no different if we divide the range, s, of summation into subranges; each subrange can be summed, and these sums added to give the total.
This is equivalent to vertical partitioning in B with conformable partitioning in C. An example will make this clear.
The product BC = A could be [formula], where B is (4  5), C is (5  6) and A is (4  6).
Here B has been partitioned between the second and third rows to give, at the top, a (2  5) submatrix (ignore the vertical partition for the moment); C between the fourth and fifth columns to give, on the left, a (5  4) submatrix.
Multiplied together, these give the (2  4) submatrix at the top left in A. The vertical partition in B, after the third column, and the corresponding partition in C, which must be after the third row, merely divide each summation into two parts.
Symbolically, we may write the operation as [formula], where the elements [formula] etc are now the above submatrices.
It is important to note the conformability; thus [formula] is (2  2)  (2  4) = (2  4) also; and [formula] is also (2  4).
It is evident that, though partitioning is largely arbitrary, conformability both in multiplication and in addition is essential.
Partitioning is often of great use, both in analytical manipulation of matrices and in numerical cases, especially when submatrices can be chosen having simple forms, such as a unit or a null submatrix.
1.5 SINGULAR MATRICES, DEGENERACY AND RANK
In a square matrix A of order n the array of elements has a determinant [formula] which is in general nonzero: this implies that the columns (and the rows) are linearly independent, so that there is no nonzero column vector x such that Ax vanishes.
If, however, a vector x does exist for which Ax = 0, this clearly means that any column of A can be expressed as a linear sum of the remaining columns; by the rules for evaluation of determinants this requires [formula] to be zero.
If two different vectors x exist for which Ax = 0, then not only is zero, but also all the minor determinants of order (n  1) vanish.
When [formula], the matrix A is said to be singular.
If only one vector x exists for which Ax = 0, A is said to be simply degenerate, or to have simple degeneracy; if more than one such vector x exists, A has multiple degeneracy.
Complementary to the definition of degeneracy is that of rank: for a square matrix, the sum of the degeneracy and rank is the order of the matrix; thus [formula] The rank is in fact the order of the largest nonvanishing minor of [formula], so that rank 0 implies a null matrix.
We shall be much concerned in this text with matrices which are simply degenerate (degeneracy 1) and with matrices in which all columns (and all rows) are proportional to each other; in the latter case all second-order minors vanish, so that the rank is 1.
In general, for a rectangular matrix, the rank is the number of linearly independent rows (and columns).
1.6 FACTORISATION OF MATRICES
All matrices can be factorised, usually in a variety of ways.
In the simplest (trivial) case we may write A = AI = IA where I is the conformable unit matrix.
Even I itself can be factorised in a variety of ways; as one example, if J is the square matrix which has units in its secondary diagonal and zeros elsewhere, which we shall call the reversing matrix, then [formula]; e.g. for order 3 [formula] so that J is a double factor of I. The ability to factorise a matrix appropriately can often be of great assistance in solving practical problems.
Singular matrices can be factorised as the product of two rectangular matrices  again, in more ways than one  the orders of which are determined by the rank of the matrix.
This may be illustrated by a simple numerical example.
Let [formula] Inspection shows the third column to be the sum of the first two.
Hence if we partition the matrix appropriately, [formula] where the submatrix products are written separately.
If now we postmultiply the (3  2) submatrix of A by I, we can add the last result to it to recover A as the matrix product [formula] Finally, a matrix of rank 1 can be expressed as the product of a column and a row in that order  for such a matrix has effectively only one independent column, all the other columns being proportional to it; similarly for the rows.
For example [formula] In general, any singular matrix of order n and rank r is expressible in an infinite number of ways as the product of two matrices of order (n  r) and (r  n), since r is the number of independent columns (and rows).
1.7 ADJUGATE AND RECIPROCAL MATRICES
If [formula] is a square matrix of order n, and if, in the determinant [formula], the element [formula] has the cofactor [formula] (that is, the value of the determinant obtained from [formula] by deleting the ith row and jth column, leaving a minor of order (n  1) which is then multiplied by [formula], then by the rules for evaluation of determinants [formula] while [formula] since the summation in the latter result gives the value of a determinant having two equal rows.
It follows that if the cofactors [formula] are arranged as a matrix [formula] of order n, then [formula] where, to conform with matrix multiplication rules, we must use the transpose of [formula] to postmultiply A. If instead of a row summation as in (1) we use column summation, we find also [formula] so that A and [formula] are permutable.
The matrix [formula] is called the adjugate of A. If A is singular, so that [formula] vanishes, then [formula] In this case, if A is simply degenerate, [formula] is of rank 1; but if A is multiply degenerate, so that all the cofactors of Aij vanish for all i, j, then [formula] is null.
These results are examples of Sylvester's law of degeneracy (see Theorem XII of 1.22), viz. the degeneracy of the product of two matrices is at least as great as the degeneracy of either factor, and at most as great as the sum of the degeneracies of the factors.
When [formula], we may divide [formula] by [formula], calling the result R. then AR = I, and R can be described as the reciprocal or the inverse of A. We have already seen that it permutes with A; however, this is readily proved alternatively.
Let [formula] from the latter, postmultiplying by R1, we have R2AR = R1, and using the former, R2 = R1.
Moreover, the reciprocal is unique; for if AR1 = I and AR2 = I, then A (R1  R2) = 0, and on premultiplication by R1 we have R1 = R2.
In conformity with usual algebraic notation, the reciprocal of A is written [formula] so that a separate symbol is unnecessary.
Thus [formula] expresses the essential property of the reciprocal of A.
Just as with transposition, if a matrix product is inverted the order of the factors must be reversed.
Let A = BCD, where the matrices are all square and non-singular.
By premultiplication or postmultiplication as required, we have successively [formula]
1.8 POWERS OF MATRICES, POLYNOMIALS AND SERIES
We have just established the existence of a negative power of a square non-singular matrix.
In general, any square matrix can be raised to any positive integral power by continued multiplication, in any order, by itself.
Any such powers are evidently permutable; for example [formula] Similarly, any power of a matrix may itself be raised to a power; for example [formula] while if A is non-singular [formula] Evidently, the index laws of ordinary algebra apply.
In particular, [formula] where A is non-singular; in fact [formula] applies for all square matrices.
Just as we can have polynomials involving powers of a scalar quantity{ gl }, e.g. [formula] so we can have polynomials of a square matrix A: [formula] where the [formula] are scalar multipliers.
Polynomials of matrices are of great importance in a variety of ways.
For example, suppose we can find a set of multipliers pi such that P(A) vanishes.
Then, if [formula] exists, we may premultiply or postmultiply P(A) by [formula] to obtain [formula] as a convenient formula for computing [formula].
We can also define matrix series; for example the parallel series [formula] exp A converges in exactly the same way as exp{ gl }.
In parallel with the scalar series, exp (- A) is the reciprocal of exp A. If, as in (2) and (3), we have a polynomial or series in a scalar quantity{ gl } and the corresponding function in the square matrix A, we may multiply the scalar function by I to obtain the two matrix functions [formula] Subtraction term by term then gives [formula] where Q is a function we need not evaluate here.
The matrix [formula] is of very great importance, not only in studies of the intrinsic properties of A, but also in dynamical and other applications it is called the characteristic matrix of A.
1.9 SPECIAL TYPES OF MATRIX
We have already defined symmetric, skew-symmetric, diagonal, unit and null matrices.
We now summarise and extend these definitions.
(1) Symmetric matrix  characterised by A = AT.
(2) Skew-symmetric matrix  characterised by A =  AT.
(3) Diagonal matrix  characterised by [formula].
We now establish an important property of diagonal matrices.
Let C be any square matrix and D a conformable diagonal matrix with diagonal elements [formula].
Then postmultiplication of C by D multiplies the columns of C in order by [formula], etc., while premultiplication of C by D multiplies the rows of C in order by [formula], etc.
Thus, the typical element of CD is [formula] while that of DC is [formula].
It follows that if C and D permute, so that DC = CD, then [formula] if [formula].
For a diagonal matrix D having all its diagonal elements different, it follows that C must then also be diagonal.
If D has some equal diagonal elements, then C can possess nonzero off-diagonal elements in a &quot; diagonal submatrix &quot; corresponding to the equal elements of D.
For example, if [formula] which we write as D = Diag(1,2,3), and [formula] then if C and D are to permute we must have [formula] so that d = e = f = g = h = k = 0, leaving C = Diag (a, b, c).
However, if D = Diag(1,2,3) we have that [formula] so that e = g = h = k = 0 leaving [formula] which we may write as Block Diag [formula] where [formula] Finally, we note that if [formula] for all i, then [formula].
This follows from the identity [formula].
(4) Unit matrix  this is the diagonal matrix with units in the diagonal positions; it is written I. (5) Scalar matrix  this is the diagonal matrix with the same scalar quantity, say [formula], in the diagonal positions.
Evidently such a matrix may be written [formula]; multiplication by a scalar matrix therefore implies multiplication by a scalar quantity.
(6) Reversing matrix  this is the square matrix having units in its secondary diagonal and zeros elsewhere; it is written J. Postmultiplication of any matrix C by the conformable J matrix reverses the order of the columns of C while premultiplication by J reverses the order of the rows.
(7) Null matrix  this has zero elements throughout.
(8) Triangular matrices  these are of two types: lower triangular, L, and upper triangular, U. L is characterised by a typical element having the property [formula] while U is characterised by [formula].
Thus, for example [formula] are, respectively, lower and upper triangular matrices.
An obvious and useful property of triangular matrices is that their determinants are simply the products of their diagonal terms.
Thus, in the above example, [formula] Another useful property of triangular matrices is the ease with which their reciprocals may be calculated.
For example, if R = (Rij) is the reciprocal of a lower triangular matrix L then [formula] where a, b, c,... must be nonzero; otherwise [formula] and R does not exist.
The first row of the product gives [formula] and in view of this, the second row, excluding its first element, gives [formula] and so on.
Accordingly, R is also lower triangular, and its diagonal elements are [formula], as is otherwise obvious from consideration of the cofactors of a, b, c,... in L. Hence a computer program to find R needs to calculate only the n (n  2) /2 elements below the diagonal; and these are found progressively.
For example [formula] gives, from the first column, [formula] and from the second column, [formula] Transposition of the above shows that the reciprocal of an upper triangular matrix U is itself upper triangular, and that its diagonal elements are the reciprocals of those of U.
(9) Persymmetric matrix  this is a square matrix in which the elements in any line parallel to the secondary diagonal are equal.
Thus, if A is persymmetric, it is characterised by the typical element [formula] and hence has, in general, 2n  1 independent elements.
For example, the matrix [formula] is persymmetric. (10) Centrosymmetric and centroskew matrices  a matrix which is symmetric about the centre point of its array is said to be centrosymmetric; thus, if A is centrosymmetric, of order n, it is characterised by [formula] More simply, if J is the reversing matrix, a centrosymmetric matrix has the property A = JAJ A centroskew matrix is characterised by the equation A = JAJ.
For example, the matrices [formula] are centrosymmetric, while the matrices [formula] are centroskew.
Centrosymmetric matrices and centroskew vectors arise in the study of the dynamics of mechanical systems having physical symmetry (e.g. a suspension bridge).
(11) Orthogonal matrices  a square, non-singular matrix A having the property [formula] is said to be orthogonal.
Orthogonal matrices are of very great importance in dynamics.
For example, the matrix [formula] is orthogonal, since [formula] It should be noted that the determinant of the above orthogonal matrix is unity: in general an orthogonal matrix will have [formula] as its determinant since [formula] (12) Hermitian and skew-Hermitian matrices  if a matrix, A, has complex elements, it can clearly be written in the form [formula] where B and C are real matrices and [formula] is the conjugate of A. If B is symmetric and C is skew-symmetric, A is said to be Hermitian.
Evidently in this case [formula] If B is skew-symmetric and C is symmetric, A is said to be skew-Hermitian.
Evidently in this case [formula] For example, the matrix [formula] is Hermitian, while the matrix [formula] is skew-Hermitian.
It should be noted that if A is Hermitian, then iA is a skew-Hermitian. (13) The isolating vector ei  this vector is the ith column of the unit matrix; it has a unit in the ith element and zeros elsewhere.
The operation [formula] isolates the ith row of A, while Aej isolates the jth column.
The combined operation [formula] gives the isolated element of [formula] (14) The summing vector [formula]  this is the vector of which all elements are units.
Evidently [formula] gives a row which is the sum of all the rows of A and A{gs} sums all the columns of A. Its principal use is in the checking of numerical operations.
The above list is by no means exhaustive, but it covers most of the matrix types which arise in the present text.
1.10 VARIABLE MATRICES, LINEAR SUBSTITUTIONS AND ALGEBRAIC EQUATIONS
So far, we have discussed the properties of matrices, and the algebra of matrices, in general terms, and without reference to the nature of the elements: where illustrations have been given we have used arithmetical numbers for elements.
But the elements can of course be variables.
They may, for example, all be functions of time, so that we must write A = A(t); they may be functions of a parameter, typified by [formula] when the elements are rational integral functions of{ gl }, the matrix is called a lambda-matrix; or the elements may themselves (especially for vectors) be independent variables.
Finally, when sets of differential equations are written in matrix form, some of the matrices will be, in effect, operators.
We now proceed to examine some simple cases.
Suppose first that we have a set of variables [formula] related to a second set [formula] by an equation of the type y = Ax; such a relation might, for example, connect the coordinates, referred to two different sets of axes, of a point moving in n-space.
Such a relation is called liner substitution.
Apart from its intrinsic importance, it is of interest since the subject of matrices evolved from it.
Cayley took a set of equations such as [formula] and enclosed each side in brackets, so that two equal columns could be connected by a single equality sign.
He then extracted the vector x and wrote it vertically as a parallel to y, so requiring the row-by-column rule for multiplication.
A second substitution of the same kind, such as z = By.
implies z = By = BAx, so that BA is the matrix converting x into z.
If the values of x are unknown, but y is given (say y = b) then we have a set of linear algebraic equations for x: Ax = b, of which the formal solution is [formula]: however, various well known methods exist for finding x without computing [formula] (see 2.5).
As a very simple example of a linear substitution, consider the coordinates of a point P relative to two sets of axes [formula] mutually inclined at an angle [formula] Evidently [formula] elimination of r, [formula] gives [formula] and it should be noted that in this case the matrix of transformation is orthogonal.
The formal solution [formula] (3) assumes that A is non-singular.
Suppose, however, that A has degeneracy s.
Then s of the rows implicit in (3) may be compounded from the remaining n  s rows, and so provide no extra information.
These s rows, including the appropriate elements of b, may thus be discarded.
We are left with n  s rows which may be written in partitioned matrix form as [formula] where A1 is a non-singular submatrix of A of order (n  s) (some reordering of the elements of x may be necessary to obtain the non-singular minor), [formula] is of the order [formula] and c is the reduced vector b; x has been partitioned conformably into n  s elements (y) and s elements (z).
We may write this equation as [formula] Equation (5) is known as a parametric solution: since we have only n  s independent relations we can determine only n  s unknowns (y) in terms of the remaining s quantities (z), which may be regarded as arbitrary parameters in the problem.
1.11 BILINEAR AND QUADRATIC FORMS
Scalar functions often arise which are linear and homogeneous in two sets of variables x1,..., xn and y1,..., ym.
By inspection, it is evident that such a function may be written as [formula] when x, y are written as columns.
Here A, of order (m  n) is called the matrix of the form.
Suppose we wish to express f in terms of new variables [formula], where [formula] then [formula] so that [formula] is the matrix of the new form.
We say hat A and [formula] are connected by an equivalent transformation.
If the elements of x, y are independent variables, we may differentiate with respect to them.
For example, from (1) above, [formula] is the first element in the column Ax, [formula] the second, etc., so that [formula] In this text, we shall be more concerned with quadratic forms, obtained from bilinear forms such as [formula] when y = x, so that [formula] where A is now necessarily square.
Moreover, although in the formulation of A the elements Aij and Aji may not be equal, we may note that (see 1.2) [formula] and on transposition it follows that [formula] so that A may be replaced by its symmetric equivalent [formula]; we now assume that A is symmetric.
Then if we form [formula] we have [formula] since transposition of the second term repeats the first.
Thus, since [formula] If we express a quadratic form [formula] in terms of another set of variables y given by the linear substitution x = By, then [formula] where [formula] and we observe that C, like A, is symmetric.
A transformation of type (2), in which A, C need not in general be symmetric, but in which B is square and non-singular, is described as a congruent transformation and is of particular importance in dynamics.
A quadratic form [formula] and the matrix A within f, are said to be positive definite (pos. def.) if [formula] for all real, non-null vectors x.
We may construct a table for such definitions (Table 1).
Necessary and sufficient conditions for a quadratic form to be pos. def. are established in Theorem XIII of 1.22.
If a skew-symmetric, transposition shows that [formula] vanishes.
[formula]
1.12 EQUIVALENT MATRICES AND CANONICAL FORMS
We have just defined the general equivalent transformation, and this leads naturally to the important concept of equivalent matrices.
We approach this by considering the elementary operations which are commonly used in the condensation of determinants.
Let I [formula] be the unit matrix with its ith and jth rows interchanged.
The new matrix is symmetric, since both its i, jth and j, ith elements are unity; moreover, its determinant is -1, since in moving the jth row to the ith position [formula] we cross j  i rows; but the original ith row is now the i + 1th and so in taking it to the jth position we cross j  i  1 rows.
In view of its symmetry [formula] can equally be obtained by interchange of the ith and jth columns instead of rows.
Let us restrict ourselves to square matrices; then premultiplication of any matrix A by [formula] has the effect of interchanging the ith and jth rows of A; postmultiplication of A by [formula] interchanges the ith and jth columns.
Next, let [formula] be the unit matrix with the addition of an element k in the [formula] position.
Then premultiplication of A by [formula] has the effect of adding k times the jth row of A to the ith row; similarly the operation AI(kij) has the effect of adding k times the ith column of A to the jth column.
The determinant of [formula] is clearly unity.
Finally, let [formula] be the unit matrix with the element [formula] substituted for the unit in the i, ith position.
Then I(li)A multiplies the ith row of A by l; A[formula] multiplies the ith column of A by l.
This operation is evidently an extension of the I[formula] operation; instead of adding multiples of a different row (or column) to a given row, it adds multiples of the same row; however, it differs in that now the determinant is l.
The reciprocal of I[formula] is I[formula] itself, since a double interchange of two rows restores the original matrix; the reciprocal of I[formula] is also evidently I[formula] and that of [formula].
Any product of these various matrices is always non-singular.
Two matrices, A, B are said to be equivalent if one can be derived from the other by any finite number of elementary operations of the types specified above.
Hence, if all the premultiplications and postmultiplications are grouped together to form (non-singular) matrices P and Q respectively, and A, B are equivalent, then B = PAQ and [formula] Since k, l are arbitrary (except that [formula]) it follows that there is a infinite number of matrices B equivalent to A. However, there is one matrix which is of especially simple form; it is described as the canonical form of A.
First, let us suppose that A is non-singular.
If A11 = 0 we can bring any non-zero element to the A11 position by the use of I[formula] we can then reduce this element to unity by the use of I(l), and then reduce all the remaining elements of the first row and the first column to zero using I(k).
We do the same for the submatrix bordered by the first row and column, and so on, so that finally we reach the unit matrix as the canonical form for A. It is to be observed that, since in using elementary operations I (i, j), I(k), I(l) on A we are effectively at each stage multiplying together determinants neither of which vanishes, the reduction of A to I must be possible when A is on-singular.
In passing, we may note that, since PAQ = I, [formula]
Now suppose A to be singular, of rank r and order n.
This means that all minor determinants of order r + 1 vanish, but at least one minor of order r is not zero.
None of the operations I (i, j), I(k), I(l), in so far as they affect the minors, can make a minor of order r + 1 nonzero, nor make the minor of order r vanish, since they merely condense these minors.
We may therefore use the processes above as for a non-singular matrix; we achieve a series of units in the [formula]positions; the other elements in the matrix are all zero.
The canonical form of A is thus [formula] where I is of order r.
If B = PAQ, where P, Q are non-singular but otherwise arbitrary, then the canonical forms for P, Q are the unit matrix.
It follows that P, Q can be constructed from I by elementary operations; accordingly, if any such product, provided only that P, Q are non-singular, A and B are equivalent matrices; and in particular, they will have the same rank.
1.13 LAMBDA-MATRICES
We have already defined a lambda-matrix of which the elements are rational integral functions of some parameter [formula] Such matrices arise in many problems of mechanics.
For example, suppose we have a pair of simultaneous differential equations [formula] where the coefficients [formula] are constants.
We can write these as [formula] where the column matrices are all functions of time.
An alternative symbolic formulation would be [formula] where D = d/dt and the square matrix is now an operator on x.
Finally, if we are seeking the complementary function of the equations, we write x proportional to exp [formula] and obtain [formula] as the equations determining [formula] and the ratio [formula] In abbreviated form this can be written [formula].
The equation determining [formula] is obtained from the vanishing of the determinant of the lambda-matrix; it is evidently a quartic in [formula] and each of the four roots will have its own associated vector x.
While it is often convenient to deal with equations such as (2) in the way indicated, it is also often useful to transform them into first-order equations.
This may be done in various ways, but one simple and useful device is to adopt two auxiliary variables: [formula] and to write the equations in the form [formula] Then if we now write [formula] and [formula] proportional to exp [formula] and again seek the complementary function, the four equations may be written, with a little rearrangement, [formula] This equation can clearly be written in partitioned form as [formula] or [formula] where [formula] The quartic determinantal equation obtained from this first-order set of equations may be seen by inspection to be identical with that obtained from the quadratic formulation.
A matrix of the form M + [formula] is called a matrix pencil (see 2.9, 2.10).
1.14 ADJUGATE LAMBDA-MATRICES
If A [formula] is a lambda-matrix, its elements are rational integral functions of{ gl } and so also, therefore, are the cofactors{ ga } of the elements of A: hence the adjugate, [formula] of A is also a lambda-matrix, and [formula] We shall denote the determinant [formula] by [formula].
The reciprocal of A, however, being [formula] divided by [formula] is not a lambda-matrix, unless it should happen that [formula] is independent of [formula].
1.15 EQUIVALENT LAMBDA-MATRICES
If we have an equivalent transformation [formula] where P, Q may be constructed by any of the elementary operations defined in 2.12, then B and A are equivalent.
It should be noted that the determinants of P, Q must be non-vanishing constants, though [formula] may appear in them.
This implies that in an operation I(k) we may see I ([formula]),; but [formula] will not appear in I(l) operations, except in a self-cancelling form, e.g. we could use I(l) with [formula] in the ith place, following with I(l) with [formula] in the jith place, so that the product of the two steps has unit determinant.
I (i, j) is, of course, independent of [formula].
Using elementary operations of this kind, we may construct a canonical form of lambda-matrices, which is diagonal, the elements in the diagonal being functions of [formula].
The canonical form is not unique; the best-known form is that due to Smith (see Ref. (1)).
If C ([formula]); is the canonical form, since C{ ([formula]) = PA ([formula])Q; we have [formula] and since [formula] are non-vanishing constants, we have that [formula] and [formula] differ only by a scalar multiplier.
Hence the roots of [formula] are identical with those of [formula].
1.16 THE CHARACTERISTIC MATRIX AND ITS EIGENVALUES
Suppose we have a linear substitution Ax = y in which it is required that y be a scalar multiple of x, say [formula]x.
Then [formula]x  Ax = ([formula]I  A) x = 0.
Then the square lambda-matrix ([formula]I  A), which is the characteristic matrix of A (see 1.8), must be singular; i.e. the determinant [formula] if x is not to vanish.
This important equation is called the characteristic equation.
If A (which is necessarily square) is of order n, then clearly the characteristic equation may be written [formula] The n roots of this equation, say [formula] are called eigenvalues.
In what follows, we shall now regard [formula]1,..., [formula]n as all different.
An alternative formulation of the characteristic function is evidently [formula] and it is to be observed that the two formulations imply, inter alia, [formula] and [formula] where Theorems II and III of 1.22 have been used.
Let us write [formula] then from (4), [formula] so that if we regard [formula]1 as typical of all the roots, [formula] since all the roots differ.
Now by the usual rule for differentiation of determinants, [formula] is linear and homogeneous in the first minors of{ gd } ([formula]).
Since, typically, [formula] it follows that not all the first minors of [formula] vanish, so that it is simply degenerate.
For the same reason [formula], the adjugate, is not null, and so has unit rank, as shown below.
Since [formula] vanishes but at least one first minor does not, there must be n  1 independent columns, the remaining column being expressible uniquely in terms of the independent columns.
This means that there will be a vector x1, called an eigenvector, such that [formula] and x1 is uniquely determined, apart from an arbitrary scalar multiplier; that is, the ratios of the elements of x1 to each other are uniquely determined.
Similarly, apart from a scalar multiplier, there will be a unique row vector [formula] such that [formula] It is readily shown that the adjugate [formula] is proportional to the unit rank product [formula].
For suppose [formula]; then [formula] Postmultiply this by any column p such that the scalar product [formula]; then [formula] Since the vector x1 is unique to a scalar multiplier, we may identify [formula] with x1 and similarly [formula] with [formula].
If we revert to the original problem, the solution of Ax = [formula]x, we now see that there are just n eigenvalues [formula]s and that correspondingly there are just n vectors xs; i.e. we have n equations [formula] We can combine them all into the single equation [formula] or more briefly AX = X[formula] where X is ow the square matrix made up of the n column vectors xs, and [formula] is the diagonal matrix of the eigenvalues; and our problem is now to find the matrices X and [formula] for the given A.
We may not one further point.
Let d be an arbitrary diagonal matrix; then it will permute with the diagonal matrix [formula] Postmultiply (7) by d; then [formula] showing that Xd is a solution of (7).
This expresses the fact that the columns of X are each arbitrary to a scalar multiplier, since in Xd, x1 is multiplied by the scalar d1 and so on.
If (7) is premultiplied by A, we have A(AX) = (AX) [formula] and AX is apparently an alternative solution for X. But since AX = X{gd} it is X postmultiplied by a diagonal matrix, as before, except that here the diagonal matrix is not arbitrary.
The matrix equation [formula] arises perhaps most commonly in the study of the natural frequencies and modes of vibration of an undamped mechanical system having n degrees of freedom.
The matrix A derives from the mechanical properties (mass and stiffness) of the system; [formula]s then determines the square of a natural frequency, and xs the corresponding mode of distortion.
However, the amplitude is not determined; xs is arbitrary to a scalar multiplier.
In this sense an arbitrary multiplier ds can be regarded as an amplitude.
Because of its importance in vibration problems, the matrix X, which is the array of the individual eigenvectors xs, is usually spoken of as the modal matrix, even when the problem from which it derives is not a dynamical one.
Similarly{ gd } in vibration problems is a diagonal matrix of squares of natural frequencies and thus defines the frequency spectrum of the system; accordingly it is referred to generally as the spectral matrix.
We may readily show that the matrix X is non-singular. for if it is not, let us first suppose it to be simply degenerate.
Then there will be a vector p, unique apart from a scalar factor, such that Xp = 0.
But in that case premultiplication by A gives 0 = AXp = X[formula]p so that [formula]p provides a relation different from p between the columns of X, which is impossible.
Similarly, if X were doubly degenerate we could have only two vectors p satisfying Xp = 0, but each of these would give rise to other different vectors [formula]p.
Hence X is non-singular.
We may accordingly write (7) as [formula] and A, [formula] are equivalent matrices.
If [formula], then B is said to be derived from A by a similar transformation.
This type of transformation is of great importance.
For example, if [formula] then [formula] where [formula] is the diagonal matrix of the squares of the eigenvalues.
Similarly [formula] and more generally [formula] where P(A) is any polynomial in the matrix A. We now revert briefly to Equations (2) and (3) in order to define the companion matrix C of [formula] ([formula]).
This is [formula] Condensation of [formula] shows that [formula] so that the eigenvalues of A and C are the same.
We shall give an example of the utility of (10) later (2.9).
1.17 THE CAYLEY-HAMILTON THEOREM
This important theorem states that any square matrix A satisfies its own characteristic equation.
First, suppose as above that the eigenvalues are all distinct; then in view of (1.16.9) we may write the polynomial [formula] (A) as [formula] Now the diagonal matrix [formula] has for diagonal elements [formula] all of which vanish; i.e. [formula] is null.
Hence [formula] (A) = 0, which is the Cayley-Hamilton theorem.
The equation is true even when the eigenvalues are not all different.
For we know that in general [formula] where [formula] is the adjugate of ([formula]I  A).
Since both [formula] and{ gd } ([formula]) I are lambda-matrices, we amy write them as polynomials in [formula] with matrix coefficients; let us write the equation as [formula] Then on multiplying out and identifying coefficients of [formula], we have [formula] Premultiply the first of these relations by [formula], the second by [formula], etc. and add: the result is [formula] (A) = 0.
Equations (1.16.3) and (1.16.4) show that the characteristic equation may be written in the alternative forms [formula] It follows that the Cayley-Hamilton theorem [formula] can be written in the alternative form, in which for consistency each factor has been multiplied by -1, [formula] It will be observed that, with eigenvalues all different, each of these matrices has degeneracy 1, the product of all n factors having degeneracy n: i.e. it is null; this is an illustration of Sylvester's law of degeneracy (Theorem XII of 1.22).
Since [formula] (A) vanishes, so also do [formula], etc. it follows that any power n or more of A, or any polynomial degree of n or more in A, can be expressed as a polynomial in A of degree n  1.
Furthermore, if A is non-singular, [formula] may be expressed as a polynomial of degree n  1 in A, and therefore also [formula] and higher negative powers.
Examples of the utility of this theorem will appear later.
1.18 SYLVESTER'S EXPANSION THEOREM
We revert to Equation (1.16.9) and write [formula], to conform with the notation for the adjugate of X. Then [formula] Now P(A) is the diagonal matrix of which the diagonal elements are [formula] in succession.
Once again we here assume the eigenvalues [formula]s to be all different.
We now write [formula] so that each square matrix is null except for a unit in the appropriate diagonal position.
In terms of the isolating vector ei this may be written [formula] and if P(A) is expanded in this way, the first term, which is typical will be [formula] The square matrices on the right of these equations are all of unit rank; we may write the last as [formula] where x1 is the first column of X, and [formula] the first row of [formula].
We thus establish Sylvester's theorem for the expansion of P(A), viz. [formula] where [formula] The unit rank matrices [formula] which are known as the constituent matrices of A, have interesting and important properties.
Thus: (i) As has been noted earlier, in 1.16 [formula] is proportional to [formula], where [formula] is the adjugate of ([formula]I  A).
(ii) We defined [formula] as [formula], hence [formula] This equation requires, for all the rows [formula] and columns [formula] [formula] Hence [formula] since the inner product (scalar) is unit.
Hence in general [formula] On account of this property, [formula] is said to be idempotent. (iii) We have also [formula] since the inner product vanishes.
(iv) If in Sylvester's expansion we put P(A), which is a arbitrary polynomial, equal to I, so that [formula], then [formula] As a simple example of the consistency of the theorem, let us write Sylvester's expansion for A as [formula] Let us square both sides; then [formula] and on account of the properties above [formula] in accordance with Sylvester's theorem.
1.19 ORTHOGONALITY AND BI-ORTHOGONALITY OF THE MODAL MATRIX
We consider first the eigenvalues (supposed all different) and modal matrix of a symmetric matrix A. then by (1.16.7) AX = X[formula] and on premultiplication [formula] Transpose this; then since [formula] we obtain [formula] Comparison of (1) and (2) shows that [formula] so that [formula] permutes with [formula], a diagonal matrix with all its diagonal elements different.
Hence (see 1.9) [formula] must also be diagonal and it then follows that [formula] is also diagonal.
Now we have seen that we can use Xd in place of X, where d is an arbitrary diagonal matrix; and it therefore follows that [formula] is also diagonal.
Now the ith diagonal element of this product is [formula] and we may clearly choose di to make this element unity.
If this is done for all i, and we write Xn for Xd, then [formula] so that the modal matrix Xn, chosen in this way, is orthogonal and we say that Xn is the normalised for of the modal matrix.
Next consider the more general case when A is not symmetric, but is expressed as the product of two symmetric factors.
Such factorisation is always possible, in a number of ways; for example, since AX = XA then [formula] so that A has been factorised into two symmetric matrices; however, this particular factorisation presupposes a knowledge of X and [formula].
Suppose instead that we can write A as [formula] where B, C are symmetric; then we have [formula] or BX = CX[formula].
Dynamical problems frequently arise in this form directly.
Now as before, if we premultiply by [formula] and transpose, since [formula] and [formula] [formula] so that [formula] permutes with [formula] and is thus diagonal, whence it follows that [formula] is diagonal also.
Clearly we can not normalise X in this case to make both [formula] and [formula] unit matrices; we say therefore that in this case X is bi-orthogonal or has a generalised orthogonal form.
We may pursue the matter a little further, however.
If as previously we write Xd for X, where d is an arbitrary diagonal matrix, then we obtain [formula] as diagonal matrices.
Now the ith diagonal element of [formula] [formula] where the bracketed term is the quadratic form obtained from C with the ith eigenvector xi.
This quadratic form is not, as previously, a sum of squares; but provided it is not zero we could choose di to make the element unity, when we should have (with X for Xd) [formula] On account of the properties established here, it is usual to say that the modes of a system are orthogonal to each other.
1.20 CANONICAL FORMS OBTAINED BY SIMILAR TRANSFORMATION
We consider a matrix A, which for the moment we regard as having distinct eigenvalues.
We have seen (1.12) that by elementary operations we can perform an equivalent transformation C = PAQ and C has a special canonical form, either the unit matrix or containing a diagonal unit submatrix.
Let us now impose the restriction that whenever we perform an elementary postmultiplying operation to make up Q, we premultiply by the inverse operation to make up P, so that [formula].
In this case, we can not reduce C to a diagonal matrix of units, since evidently [formula].
However, a simple and powerful canonical form is obtainable.
The matrix A has the characteristic matrix ([formula]I  A).
If this is premultiplied by [formula] and postmultiplied by Q, then [formula] so that A and [formula] have the same eigenvalues.
In particular, if we identify Q with the modal matrix X, then [formula] by (1.16.8).
Hence, as is otherwise obvious from the derivation of (1.16.8), the eigenvalues of A are given directly by [formula], etc.
In this case, therefore, we may regard [formula] as the simple canonical form for a matrix having distinct eigenvalues.
Note that if [formula] and [formula] then [formula] so that, if X is the modal matrix of A, QX is the modal matrix of B.
The great majority of mechanical problems give rise to matrices having distinct eigenvalues; however, problems involving equal eigenvalues are by no means unknown.
For example, a rigid body may have some of its motions constrained by springs, but two (or more) unconstrained; it will then have two (or more) equal zero frequencies.
The canonical form obtained by a similar transformation of a matrix having equal eigenvalues, when reduced to its simplest form, has the eigenvalues in the principal diagonal, with equal eigenvalues in groups; elsewhere the matrix is null except that there may be units in the superdiagonal.
Thus, for example, a matrix of order 6 having three equal eigenvalues{ ga }, two equal eigenvalues{ gb }, and an unrepeated eigenvalue{ gg }, would have as its canonical form [formula] where each x may be zero or unity; it occurs only within the submatrices containing groups of equal eigenvalues.
C is sometimes called the Jordan canonical form and a submatrix such as [formula] is known as a Jordan block.
If in C a unit occurs at a position x, it can not be removed by a similar transformation.
This is readily proved as follows.
Suppose we wish to remove the unit from the superdiagonal in the matrix [formula] by means of a similar transformation; we can write the most general transformation matrix as [formula] Then [formula] and if both off-diagonal elements are to vanish, then [formula] Provided [formula] we may satisfy (4) and [formula]: for example, if we choose [formula].
Then [formula] However, if [formula] then the only solution of(4) is c = d = 0 when ad  bc vanishes, so that [formula] does not exist.
Hence, if a unit exists (it may not) in the superdiagonal element adjacent to two equal eigenvalues, it can not be removed by a similar transformation.
The reason for the existence or absence of a unit in the superdiagonal is easy to see.
Suppose C in (3) is derived by a similar transformation from a matrix A, to which it is therefore equivalent.
Then if [formula]I-A is simply degenerate when [formula] = { ga } so also is [formula]I  C. There must in consequence be two units in the top two places of the superdiagonal, for without them [formula]I  C would have three null rows and three null columns, and so be triply degenerate.
With two units present, only the first column and third row are null, so that [formula]I  C is simply degenerate as required.
Evidently the introduction of a unit reduces the degeneracy of [formula]I  C by unity.
It is clear from the derivation of the canonical form that it is conventional, but not necessary, to put equal eigenvalues in groups, and to put units in the superdiagonal.
In a practical case, we might find an intermediate form for the matrix C of (3) to be (we shall discuss the meaning of x and y) [formula] Now suppose [formula], and therefore [formula] to be simply degenerate when [formula] Then [formula] must also be simply degenerate.
This requires that either x or y is nonzero, the other being zero; only so does the matrix have only one null column and row.
Suppose x is zero, so that the second row and fifth column are null.
Then y must not vanish.
Now, by further operations of type I (i, j) we can bring the two roots into adjacent places in the diagonal, when -y will move to the infradiagonal and a zero, replacing x, will appear in the superdiagonal.
A similar transformation of the submatrix then gives [formula] so that, so far as [formula] is concerned, we have arrived at the conventional canonical form.
The conventional form is generally convenient; it is useful to have equal eigenvalues in groups and to the use superdiagonal; but in particular cases it may be well to use a nonzero quantity other than a unit in the superdiagonal.
For example, if the eigenvalues are numerically small, a unit could dominate the spectral matrix; or if they are very large, a unit could be of the order of the errors in the computation.
It would then be better to choose a quantity of the order of magnitude of the repeated eigenvalue, or even the eigenvalue itself, provided this is not zero.
1.21 SOME REMARKS ON EQUAL EIGENVALUES
We have already remarked that the case of equal or confluent eigenvalues is not common in practical dynamical problems; however, when it occurs, it can cause difficulties.
We shall not attempt to deal with the full problem of multiple equal eigenvalues; it will be sufficient in general to consider the case of two equal roots [formula]1 = [formula]2 as typical.
Extension to more complicated cases is not difficult (a) Extension of the Cayley-Hamilton theorem In 1.17 we showed that a matrix A satisfies its own characteristic equation, i.e. [formula] irrespective of the nature of the eigenvalues.
Now when [formula], the characteristic matrix ([formula]I  A) may be simply or doubly degenerate, according to the nature of its canonical form [formula] If it is simply degenerate (i.e. its canonical form includes a unit in the superdiagonal) then [formula] (A) = 0 is in fact the vanishing polynomial in A of lowest degree.
But if it is doubly degenerate, there is a reduced characteristic equation [formula] being one degree lower than [formula].
Consider the equation [formula] where [formula] is the adjugate of ([formula] if for [formula] ([formula] is doubly degenerate [formula] is null, since all first minors of ([formula]I  A) vanish.
Accordingly, [formula] must have the factor ([formula]  [formula] in each element, while [formula] ([formula]) has two such factors.
If we cancel one factor from both sides, we can write [formula] and then, as before, we can prove that [formula].
An alternative expression is evidently [formula] where the first factor is doubly degenerate; the remainder are simply degenerate, giving as before a total degeneracy n.
But if the first factor is simply degenerate, it must appear twice.
We may illustrate this with a simple numerical case.
Let [formula] Put [formula] = 2 and [formula] in the characteristic matrix [formula] then [formula] since by inspection it is doubly degenerate.
Also [formula] which is simply degenerate; the sum of the columns vanishes, but first minors do not.
By inspection [formula] satisfying the reduced characteristic equation.
However, when we put [formula] [formula] in the characteristic matrix [formula], we obtain [formula] both of which are simply degenerate; it is readily checked that [formula] which satisfies the full characteristic equation.
If a reduced characteristic equation exists, so that [formula], then [formula]... also vanish; and, as in 1.1, we may now express powers of A or polynomials in A as polynomials in A of degree one less than that of [formula].
(b) Nature of modal matrix We now turn to the confluent form of Equation (1.16.7), viz. AX = XA, and we shall deal with this by using the example of the numerical matrices A and B above.
By inspection of (3) we can see that a column vector xi postmultiplying (2I  A) to give a null result must be such that [formula] vanishes; two obvious vectors achieving this are{ 1,2,4) and{ 1,0,  1 }.
Also if (I  A) is postmultiplied by{ 1,1,1 } it vanishes.
Hence we may write (8) as [formula] so that, in this case (two equal roots and double degeneracy of the characteristic matrix), the form of (8) is unaltered.
However, it should be noted that, while the first two columns of X are arbitrary to a scalar multiplier, they are also obviously arbitrary to any linear combination; i.e. we may postmultiply the equation by a matrix [formula] in which [formula] and [formula] are arbitrary except that D must be non-singular.
The leading second-order submatrix of D corresponds to the scalar submatrix in A and therefore permutes with it, so that as before [formula] and XD is a solution of (8).
Now consider the matrix B, which yields two equal eigenvalues but simple degeneracy.
Since 2I  B and I  B are both simply degenerate we can find only one vector for each to satisfy [formula]; they are respectively (see (6)) { 1,2,4 } and{ 1.1.1 }.
No other vector (apart from scalar multiplies) satisfies the equation.
It follows that we can not make up a square matrix X of eigenvectors to satisfy (8); for this reason the matrix B is said to be defective.
However, we may note that{ 1.2.4 } and{ 1.1.1 } are the first and third columns of the modal matrix X appropriate to A. If we try this X for B we find that we obtain L or [formula] where [formula] is now the diagonal matrix of the eigenvalues with the addition of a unit in the top superdiagonal place.
Evidently A is the canonical form of A, and [formula] that of the defective matrix B. In this latter case, a postmultiplying non-singular matrix analogous to(9) which will permute with [formula] is [formula] where [formula] and [formula] are otherwise arbitrary.
Then [formula] satisfies (10).
We may note that, for both the matrices A and B, [formula] we may therefore construct the canonical forms directly from [formula] We now revert to (10); if for convenience we denote the columns of X by [formula] respectively, then as we have said above, only x1 and x3 are eigenvectors; they make [formula] and [formula] vanish, and are each unique to a scalar multiplier.
We now examine the origins of [formula], which in relation to B we call an auxiliary vector.
In (10) we find [formula] Premultiplication of this by (2I  B) yields [formula] Accordingly, postmultiplication by the auxiliary vector [formula] annihilates [formula], in fact, since [formula] is doubly degenerate, there will be two linearly independent postmultiplying vectors which annihilate it, separately or in linear combinations.
We can, and indeed must, choose x1 as one of these, for sine [formula] vanishes, so evidently does [formula]; and then [formula] is the other.
We might have anticipated this result from Equations (5) and (7).
I (5) the doubly degenerate matrix (2I -A) is annihilated by two linearly independent eigenvectors and the simply degenerate (I  A) by the third.
In (7), [formula] is doubly degenerate and so corresponds with (2I  A); however, in this case one eigenvector and one auxiliary vector emerge.
It is now clear that, to make up X, we require (i) an eigenvector x1 satisfying [formula] (ii) an auxiliary vector [formula] satisfying [formula] (iii) an eigenvector x3 satisfying [formula] In the numerical example (10), [formula] and postmultiplication by x1 = { 1,2,4 } makes both vanish; postmultiplication by [formula] = { 1,0, -1 } annihilates the second but not the first; the product yields  x1.
There is one further point of importance: [formula] is not unique, but contains an arbitrary element.
As we have seen, [formula] is satisfied by two linearly independent vectors (in our example, { 1,2,4 } and{ 1,0, -1 } and therefore by any linear combinations of them.
One is chosen to be x1, satisfying (2I  B) x = 0; the other [formula], is another arbitrary combination.
However, it must satisfy [formula] so that if x1 is multiplied by a scalar d1 so also is [formula]; which is why (11) contains equal elements [formula].
But the equation is also satisfied by [formula] so that in our example [formula] with [formula] arbitrary.
This result is evidently generally true.
As a dynamical example, perhaps the commonest practical problem leading to a double eigenvalue with simple degeneracy is that of critical damping.
For example, the equation of motion of a mass constrained by a spring and dashpot is, in the well-known standard form, [formula] Here we shall consider [formula] as fixed and [formula] as a variable (in practice, a variable spring stiffness) and for convenience we shall write [formula] for [formula]; this [formula] is a variable parameter, which may be positive or negative: in the latter case, the motion is a damped oscillation.
If x is proportional to exp[formula]t, we have to satisfy [formula] and for [formula] we therefore have two different roots: either two real roots or a conjugate complex pair.
However, for [formula] (the critical damping case) the two roots coalesce into a repeated (real) root, [formula] = [formula]
If we write v for x, the equation of motion may be written as the first-order pair [formula] or, when{ x, v } is proportional to exp[formula]t, [formula] This requires [formula] Giving of course the eigenvalues [formula] for each of which [formula] is simply degenerate.
In the critical case, when{ ge } vanishes, we have two equal eigenvalues -{gm}, but [formula]I  B(0) is still simply degenerate; thus [formula] and so the column [formula] makes this vanish.
Squaring it, we obtain a null matrix, so that [formula] is satisfied by two linearly independent arbitrary columns.
As we have seen, one of them must be [formula] the most general form of the other is [formula].
Choosing [formula] we find Equation (13) to be, in this case, [formula] with the spectral matrix (on the right) in the standard canonical form for a defective matrix.
There is a further interpretation of this which is of great importance.
With [formula] general, we may write the eigenvectors corresponding to [formula], [formula] respectively; hence the operation [formula] and [formula] is converted to the standard canonical form for matrices with distinct eigenvalues.
However, this similar transformation fails when [formula]: the two columns of X(O) become identical, so that its reciprocal becomes infinite.
However, if we postmultiply [formula] by a matrix which in effect sums and differences the columns, thus: [formula] then a similar transformation [formula] yields [formula] as follows: [formula] i.e. a quasi-canonical form: it is to be remembered that the standard canonical forms are conventional, but not unique.
The two matrices [formula] and [formula] are both equivalent to [formula] and to each other; they are each derived from [formula] by a similar transformation.
But the first is possible only for [formula], while the second becomes a canonical form appropriate to a defective matrix only for [formula]
This example shows that the auxiliary vector{ 0,1 } is in effect the differential with respect to [formula] of the two confluent vectors [formula] and [formula].
(c) Extension of Sylvester's expansion
We come finally to the confluent form of Sylvester's expansion for any polynomial, and again we use the matrices A and B as exemplifying the general case; however, for greater generality we shall write the repeated roots as [formula] the unrepeated root as [formula] and shall replace the unit in the superdiagonal of [formula] by r.
Them, treating A first, we have as in 1.18 [formula] The expansion can then proceed as before, and we obtain [formula] where [formula] are unit rank matrices.
The case of B, involving a double root with simple degeneracy, is a little more complicated.
In succession, we find [formula] Now P ([formula] is the polynomial in [formula] [formula] which on differentiation gives [formula] We can evidently construct corresponding polynomials in the matrices B and [formula], and in view of (13) they will be connected by [formula] in (17), the first column of X is proportion to [formula].
Using (15), we may expand [formula] in the form [formula] Accordingly, if we now proceed (exactly as in 1.18) to premultiply (17) by X and postmultiply by [formula] ([formula]; the first row of [formula] is proportional to r) we obtain [formula] Comparison with (14) shows that we have now added a new term [formula], which derives from the element r in the superdiagonal of the canonical form.
This also is a square matrix of unit rank, and z12 evidently has the properties [formula] In fact, the above equations, and the properties of the matrices [formula] discussed in 1.18, are all contained in the general result [formula] on evaluation of the inner products.
Equation (18) is the confluent form of Sylvester's expansion for the case of two equal eigenvalues which give only simple degeneracy to the characteristic matrix.
As an example of its consistency, if we put P(B) = I, giving [formula] then as before [formula] while if P(B) = B, then [formula] If we multiply (21) and (22) the result repeats (22) in view of the properties of zii and (19) above; squaring (22) yields [formula] in accordance with (18).
As an indication of Sylvester's expansion for more complicated cases, suppose we have [formula] Then if we evaluate [formula], it is readily found that the expansion in this case will be [formula] Further extensions follow this pattern.
Example 1
We use B as defined in (2) together with [formula] where, in order to obtain r = 2 in [formula], the first column of X in (10) has been halved, and the first row of YT in (12) has been doubled.
These matrices satisfy [formula].
Suppose we wish to calculate B-1.
Then in (18) we put [formula] with [formula] Then [formula] The reader is invited to check that this is correct.
1.22 SOME MISCELLANEOUS THEOREMS
We append here some useful theorems and definitions: Theorem 1  The reciprocal of the transpose of a matrix A is equal to the transpose of the reciprocal of A. For [formula]; transpose to get [formula].
Postmultiply by the reciprocal of AT, and [formula].
We shall write [formula].
Corollary If A is symmetric, so also is A-1.
Example 1 If [formula] Also [formula] Theorem II  The trace of a matrix, which is the sum of the elements in the principal diagonal, equals the sum of the eigenvalues.
If the determinant [formula] of order n is expanded in the usual way, the first product is [formula] times its cofactor, which is similar to [formula] but of order  1.
All other products are evidently of degree n  2 in [formula] at most.
By progressive expansion of the diagonal minors it therefore appears that the two leading terms in the full expansion are [formula] Now if we write the determinant as [formula] the leading terms are [formula] Hence [formula] Example II Let [formula] and on expansion [formula] Then [formula] Theorem III  The product of the eigenvalues of A is [formula].
In [formula] put [formula].
Then [formula].
Hence if an eigenvalue of A is zero, A is singular; and conversely if [formula], then one or more eigenvalues are zero.
Example III If A is the matrix in Example II, then [formula] Theorem IV  If a scalar [formula] is added to each principal diagonal element of A, the eigenvalues of A are each increased by [formula].
This is immediately obvious from the identity [formula] The sum of the eigenvalues, and tr A, are each increased by [formula] Example IV Let us add -3 to each diagonal element of the matrix A in Examples II and III; we obtain [formula] and [formula].
Hence, compared with Example I, the eigenvalues have each been reduced by [formula] and [formula].
Theorem V  Any matrix can be expressed as the product of two symmetric matrices.
The case in which A is non-defective has already been treated in 1.19, so that here we must consider the defective case.
We write [formula] Suppose we have a Jordan block or blocks such as [formula] in [formula].
If this is premultiplied and postmultiplied by the reversing matrix Js, then [formula].
Now let K be the unit matrix, except that wherever a block [formula] occurs in [formula], the corresponding Is is replaced by Js in K. Then [formula] We now revert to (1), postmultiply by [formula] and use (2): [formula] on use of the transpose of (1).
Now (3) shows that AXKXT equals its own transpose and so is symmetric, says S1.
Also XKXT is symmetric, says S2.
Thus [formula] is expressible as the product of two symmetric matrices.
If we put [formula] and hence K = I, the two factors in (4) reduce to those given in 1.19.
This establishes that such factorisation is always possible.
A general numerical procedure, not involving X, [formula], is given later in 2.10.3.
Example V (a) Non-defective matrix  let A be as defined in (1.21.1).
Then [formula] assumes the numerical values [formula] In this case [formula] and [formula] The reader may check that [formula].
(b) Defective matrix  we use B as defined in (1.21.2).
Then [formula] takes the numerical values [formula] X being the same as in case (a).
Also if [formula] then [formula] while [formula] and again the reader may check that the product of the symmetric matrices [formula].
Theorem VI  AT can be obtained from A by a similar transformation.
This has been proved incidentally above; (3) may be written [formula] Example VI We use the matrix B of Example V(b) above.
Then [formula] Theorem VII  If A is any matrix of order (n  p) and B any matrix of order (p  n) where [formula], then the eigenvalues of AB are those of BA plus n  p zeros.
Consider the matrix product [formula] where the matrices are square and of order N = n + p, appropriately partitioned.
If we take determinants and note that [formula] we have [formula] In reverse order, we find [formula] so that [formula] and hence
It follows that the zeros of [formula] also make [formula] vanish and that [formula] must also have n  p roots [formula] = 0.
This last is otherwise obvious; A has only p columns, so that AB, of order (n  n), has at most p linearly independent columns; i.e. it is of rank p at most, and so must have at least n  p zero eigenvalues.
If A and B are square, and one at least (say B) is non-singular, this result may be obtained very simply from the identity.
[formula]
Corollary
If A, B an C are matrices of order n  p, p  q and q  n respectively, the nonzero eigenvalues of ABC (n  n), CAB (q  q) and BCA (p  p) are identical.
(N.B. if n = p = q, this corollary applies only to the products ABC, CAB and BCA and not to BAC, etc.; i.e cyclic permutation only is permitted.)
Example VII
(a) [formula] Here, [formula] The reader will verify that AB is of rank 1 with spectral and modal matrices [formula] while BA is of rank 2 with [formula] The eigenvalues of AB and BA are thus 0, 0 and -7.
It will be noted that trBA = tr AB = -7 and that Theorem VII applies regardless of the fact that rank [formula] rank BA. (b) [formula] Here [formula] each product being of unit rank.
The reader will verify that the spectral and modal matrices are (i) For ABC: [formula] (ii) For CAB: [formula] (iii) For BCA: [formula] The non-zero eigenvalue, 47, is common to all products, as indicated by Theorem VII.
Theorem VIII  If, in the eigenvalue problem [formula] the matrices B, C are both real and symmetric, and if further C is pos. def. and B non-neg. def., then all eigenvalues are real, finite and non-negative.
In (8) write [formula]; multiply out and separate real and imaginary parts.
Two real equations result: [formula] If these equations are respectively premultiplied by [formula] and subtracted, then since B, C are symmetric, the left-hand side vanishes, and [formula] Since x is not null and C is pos. def., the bracketed term is positive; hence{ go } = 0 and [formula] is real.
Equation (8) is now [formula] and since the bracketed term is real, so also is x.
Finally, [formula] and with the given conditions [formula] is therefore real, finite, and non-negative.
I B, C are both pos. def., [formula] is real, finite, an positive.
Corollaries
(i) The eigenvalues of a real, symmetric matrix A are all real and finite, and the eigenvectors are real.
This follows from the theorem by putting B = A, C = I. (ii) The eigenvalues of a Hermitian matrix are all real.
The relevant equation may be written [formula] which yields the two real equations [formula] Premultiply the first by qT and the second by pT.
Since B is symmetric, the left-hand sides are equal, an since C is skew-symmetric, its quadratic form vanishes (by transposition [formula]).
Subtraction of the two equations therefore yields [formula] (iii) The non-zero eigenvalues of a skew-Hermitian matrix are all imaginary.
Here we premultiply the first of Equations (10) by [formula] and the second by [formula].
Now, since B is skew-symmetric it disappears from the equations, while C is now symmetric, so that the right-hand sides are equal and opposite.
Addition then gives [formula] Theorem IX  If a matrix A and its inverse A-1 are known, then if a unit rank matrix is added to A, the inverse of the new matrix may be written down.
Let us write the unit rank matrix as [formula] Then [formula] Hence [formula] To prove this, we merely multiply out the product [formula] which gives I. Equation (11) is an example of the &quot; modification formula &quot; generally ascribed to Householder (3).
(Householder's formula relates to the more general case where a matrix of rank [formula] is added to A. The additional, modifying, matrix may be written CRT with C of order (n  p) and RT of order (p  n).
Then with [formula] an analogy with Equation (11).
The reader might verify this useful formula in which it is evident that the inverse of a matrix of order n is obtained in terms of that of a matrix of smaller order).
Examples of the use of this theorem are given in 2.2.3.
Theorem X  If A is real and symmetric, it can not be defective.
For suppose it is; let us write its spectral matrix [formula] with a Jordan block [formula] as the leading submatrix of order 2 (it can be part of a larger block); x1 is the corresponding eigenvector and [formula] the auxiliary vector.
Then from the first two columns of [formula] we have [formula] on transposition, since A is symmetric; and [formula] Premultiply (13) by [formula]; then in view of (12) [formula] In corollary (i) of Theorem VII we have shown that if A is real and symmetric, x is real; hence in (14) [formula] and p vanishes; similarly for all other superdiagonal elements.
Hence A is not defective.
Corollaries
(i) If A is not symmetric, but can be written as the product [formula], where B, C are real ad symmetric, and in addition C is pos. def., then A can not be defective.
In this case, (12) becomes successively [formula] and (13) yields [formula] Hence as before, [formula] We have proved in Theorem VIII that with these conditions, X is real; hence (15) implies p = 0, and as before, A is not defective.
Provided only that A is real, it is always possible to find real symmetric matrices B, C satisfying CA = B (see 2.10.3).
Accordingly, if A is defective, so that [formula], (15) requires that [formula] must vanish.
(ii) A defective matrix can not be similarly transformed into a real symmetric matrix.
For suppose we have a Jordan block as in the theorem, with [formula], and we apply the most general transformation; we obtain [formula] This can not be real and symmetric: it requires c = d = 0, which violates the condition [formula].
Example X
We now illustrate Equation (15).
(a) No-defective matrix  we choose the matrix defined in (1.21.1), and write it here as A1.
Then with [formula], one possible numerical expression is [formula] Here B, C are real and symmetric, and C is pos. def.; A1 has two equal eigenvalues, 2, and two eigenvectors{ 1,2,4 } and{ 1,0, -1 }.
These give [formula].
With p  0 these satisfy (15).
(b) Defective matrix  we use the matrix defined in (1.21.2), writing it here as [formula].
A possible choice for C, B now yields [formula] As in (a), [formula] has two equal eigenvalues, 2, but here possesses only the one eigenvector, { 1,2,4 }.
With this vector [formula].
This permits [formula] in (15).
Theorem XI  The eigenvalues and vectors of a complex matrix may be obtained by treating a related real matrix.
Suppose [formula] where [formula] are all complex.
There will be a conjugate relation [formula] Write A = B + iC, with B, C real; then [formula] These equations are contained, twice, in the single equation [formula] Hence the real matrix [formula] has the eigenvalues [formula] of (16), and the corresponding eigenvectors [formula]
Example XI
Let [formula] Then the numerical form of (12) is [formula] so that the eigenvalues and vectors of A are 1  i, { -1,2 + i } and 2 + 3i, { -1,1 -i }.
In the alternative form the system matrix, modal matrix and spectral matrix of (17) are, respectively, [formula] and the reader is invited to check that these matrices satisfy (17).
Theorem XII  Sylvester's Law of Degeneracy: the degeneracy of the product of two square matrices is at least as great as the degeneracy of either factor, and at most as great as the sum of the degeneracies of the factors, or the order of the matrices, whichever is less.
We consider the product C = AB, where A, B have degeneracies p, q respectively.
Without loss of generality we may take [formula]; for, since the rank of a matrix is unaltered by transposition, we may consider the product either as C = AB or CT = BTAT, with the matrix of greater degeneracy on the right.
We also recall the result proved in 1.12: if a matrix of rank r is multiplied by a non-singular matrix, the product has rank r.
This is because the non-singular matrix may be regarded as made up of elementary operations, and these can not change the rank of a non-vanishing minor of order r.
This is a special case of Sylvester's law: the degeneracy is at least n  r and at most n  r; i.e. it is n  r.
We now turn to the general case, and approach it by means of a simple example.
Let A, B be standard canonical forms, with n  p, n  q consecutive units in the respective diagonals, beginning at top left, and with zeros elsewhere.
Since [formula], only the first n  q units in A are multiplied by units in B, ie.e. the product is identical with B and is of degeneracy q.
Now move the units in A one place down the diagonal.
The leading element in the product now vanishes, i.e. the degeneracy is now q + 1.
Proceeding in this way, we finally have all the units in A at the lower end of the diagonal, with p zeros at the top, so that the degeneracy of the product is now p + q (or, if [formula], the product is null).
This example not only illustrates Sylvester's law; it also makes the important point that the degeneracy of the product depends on the relative positions of the units, and therefore of the linearly independent columns, in the two factors.
Now let A, B be general, and let the equivalent transformations which give them canonical form be PAQ, RBS, where P, Q, R, S are non-singular.
Then we may write [formula] and D has the same degeneracy as C. The matrix [formula] will in general be fully populated; if it is appropriately partitioned, we may write (18) as [formula] To be conformable with the first and last matrices in the triple product, [formula] must have  p rows and n  q columns.
The final matrix then shows that D has p null rows and q null columns; since [formula], D has degeneracy q at least.
This establishes the first part of the theorem.
Whether or not D has greater degeneracy than q clearly depends on the number of linearly independent columns in [formula] To examine this question, consider the submatrix [formula] The n  p rows of this submatrix, being part of [formula], a on-singular matrix, are necessarily linearly independent.
In consequence, [formula] must have just n  p linearly independent columns, making up a non-vanishing minor of order n  p; the other p columns are either linear combinations of some or all of the columns of the minor, or null (a special combination with all coefficients zero).
First, suppose [formula].
Now [formula] has n  q columns, the minor n  p columns: separately, a total of 2n  (p + q).
But [formula] has only n columns; hence [formula] and the minor have at least n  (p + q) common columns, i.e. [formula] has at least n  (p + q) linearly independent columns.
The remaining p columns of [formula] may themselves be columns of the minor, or may be linear combinations of some or all of the columns of the minor, or may be linear combinations of the n  (p + q) common columns, or may be null.
At the extremes, all columns of [formula] or n  (p + q) columns of{ ga } will be linearly independent, giving D degeneracies of q and p + q respectively; any intermediate degeneracy is clearly possible.
Finally, suppose [formula].
Then, as before, at one extreme, [formula] can contain n  q columns of the minor, which are linearly independent; at the other extreme, the minor could be wholly contained in [formula] and [formula] could be null, when D has degeneracy [formula].
This establishes the second part of Sylvester's theorem.
Examples XII
Although in a numerical case we can find the degeneracy of AB by evaluating and studying C, it is instructive to examine (18) in numerical cases.
We give two examples (i) Let [formula] Here A is of degeneracy 1, B and C each of degeneracy 2.
Now by building up P and Q by elementary operations (see 1.12) we obtain [formula] giving the canonical form for A. Similarly, for B we obtain [formula] Inverting Q and R, we find [formula] Hence D is given by [formula] and D is of degeneracy 2.
Note that here the submatrix [formula] is [formula] and we can take the first two columns as linearly independent; the third is obtained by subtracting half the first from the second.
(ii) If we transfer the last column of A to the first position, we have a new matrix A; with B unchanged we then have [formula] A is again of degeneracy 1, B of degeneracy 2; but here C is of degeneracy 3.
We leave it to the reader to follow the steps of Example (i), merely noting that, in the present example, [formula] is [formula] with a null column in the first position.
Theorem XIII  a necessary and sufficient condition that a real symmetric matrix A shall be positive definite is that all its leading discriminants shall be positive.
If [formula] is the real symmetric matrix of a quadratic form [formula], then the determinant [formula] is called the discriminant of the form.
The leading minor discriminant of order 1 is [formula] of order 2 is [formula], and so on.
Thus if A is to be pos. def., the theorem requires that [formula], all i.
The vector x is real and non-null but otherwise arbitrary.
First, we put x = e1. then [formula] reduced to [formula], which must be positive if f is pos. def.
Next, put [formula] Now f reduces to [formula] which, with [formula] requires also that [formula].
We might proceed in this way, but a different approach is simpler.
In the full expansion of f, collect together all the terms involving x1; they may be written as a perfect square, so that [formula] where B is a quadratic form involving [formula],..., xn only.
Now we collect all terms involving [formula] and write them as a perfect square, and so on, so that ultimately we can write the form as [formula] where in fact [formula] and [formula] while y is related to x by the triangular substitution (see (20)) [formula] and [formula].
Hence f = [formula], so that [formula], and [formula] Now put xn = 0.
Then f may now be written as the quadratic form of which the matrix is the leading minor of A of order b  1, and we may deduce, corresponding to (22), [formula] Now, by putting [formula] we establish a similar result for [formula] and so on, so that finally we establish that [formula] From (21) it is clear that a necessary and sufficient condition that f shall be positive is that Di shall be positive, all i.
It then follows from (23) that all the discriminants [formula] must also be positive; this also is both necessary and sufficient.
This theorem was first established by Sylvester.
Theorem XIV  If two matrices A, B permute, then provided at least one is non-defective, they share the same modal matrix.
Let [formula] where C is the canonical spectral matrix [formula] of A, and X is the modal matrix of A:X is not unique; if we have any matrix Y which permutes with C, then from (24) AXY = XCY = XYC so that we can write the modal matrix of A alternatively as XY.
Now write [formula] so that D is a similar transform of B. Then it follows from AB = BA, on use of (24) and (26), that CD = DC.
We now consider three cases.
(a) The eigenvalues of A are all different.
Thus A is non-defective, and C is diagonal, all the diagonal elements being different.
In this case (see 1.9(3)), D must also be diagonal; (26) then shows that D is the spectral matrix of B, which thus shares the modal matrix X with A. Note that D can have equal elements (including zeros) in its diagonal.
(b) A is non-defective, but has some repeated eigenvalues.
Thus C is diagonal; we assume it to be in standard canonical form, i.e. it has equal eigenvalues grouped together: this only requires the columns of X to be written in appropriate order.
Suppose, for example, that C has m eigenvalues{ ga }, in a submatrix of order m in the leading diagonal place: there may be similar submatrices down the diagonal.
Thus C possesses a scalar submatrix [formula] in the leading position; Equation (28) (see 1.9(3) again) now permits D to have a corresponding submatrix, also of order m, arbitrarily populated: thus D will have a block diagonal form.
Now in this case D can be reduced to a standard canonical form E by a similar transformation [formula] where Y has precisely the same block diagonal form as D; in effect, each diagonal block is resolved into its own canonical form.
But in this case, Y will, like D, permute with C; thus from (25) the modal matrix of A may be written XY, while (26) and (29) require [formula] so that B shares the modal matrix XY with A. Note that here, E need not be diagonal, so that B can be defective; but non-zero elements in the superdiagonal of E can only occur in a submatrix corresponding to a scalar submatrix in C.
This establishes the theorem as stated.
(c) For completeness, we look briefly at the case where A is defective.
Suppose, for example, that C has the leading submatrix [formula] then the most general submatrix of D which will permute with C1 is [formula] where d, e, f are arbitrary.
Thus D1 is in general not of standard canonical form; to make it so we must at least remove f.
Now while, we can find a submatrix Y1 such that [formula] where E1 is of standard canonical form, Y1 is not of the general form D1 and so will not permute with C1.
In this case, A has the modal matrix X, B the modal matrix XY.
Of course, it is possible for B to be such that f = 0, when D1 is in standard canonical form: A and B then share the modal matrix X, though both are defective.
The obvious (trivial) example of this is that of a defective matrix permuting with itself.
But in general, if A and B are both defective, they do not have a common modal matrix.
In the light of the above, it is clear that two matrices having the same modal matrix do not necessarily permute; they will do so only if their spectral matrices permute.
Corollaries
(i) If A, B permute, so do powers of A, B. For example [formula] (ii) If A, B permute, and the eigenvalues of A are all different, then B can be expressed as a polynomial in A.
As in case (a) above, here [formula] where C, D are both diagonal.
Now consider the general polynomial [formula] We can identify P(C) with D; this requires [formula] where [formula]1,..., [formula]n are the diagonal elements of C, and d1,..., dn those of D. The square matrix in (30) is known as an alternant: its reciprocal is discussed in Ref (P1).
Equation (30) defines the coefficients [formula] such that [formula] which establishes the corollary.
It is capable of extension, but we shall not pursue the matter here.
There is a considerable literature concerning permutable matrices, see, for example, Turnbull an Aitken (4).
Theorem XV  Sylvester's &quot; Law of Inertia &quot;
Let a real symmetric matrix A be rendered diagonal by a congruent transformation [formula], where B is, of course, non-singular.
Then the numbers of positive, zero, and negative elements in the diagonal of [formula] are invariant with B.
The number of matrices B which diagonalise A in this manner is indefinitely large: to prove the theorem we need only show that it holds for any two.
Our first choice is B = X, where X is the orthogonal modal matrix of A (being symmetrical, A can not be defective: see Theorem X) so that [formula], the spectral matrix of A. Our other choice is [formula], where L is the lower triangular matrix having units in its diagonal and its other elements so chosen that LA is upper triangular, when [formula], a diagonal matrix.
It is a simple exercise, which we leave to the reader, to show that the process of triangulation of A by LA without row interchanges gives the elements [formula] discussed in Theorem XIII, where the [formula] are the principal discriminants of A and [formula]: and example (for a non-symmetric matrix) is given in [formula]
We first note that if A is of rank r, then so are [formula]; for since [formula], B, may be regarded as made of elementary operations [formula] which can not change the rank of a matrix.
Thus both [formula] and D will have n  r zeros in their diagonals, which establishes part of the theorem.
To prove the remainder, we suppose the theorem does not hold; let [formula] have p positive and r  p negative elements in addition to its n  r zeros, and D have q positive and r  q negative elements.
Then [formula] where, since we have introduced the negative signs, all non-zero [formula]i and[formula] are positive numbers.
We begin by supposing p = q + s, where s is a positive number, so that [formula]; and we examine the general quadratic form [formula].
If [formula] then [formula] These two expressions are equal for all x; subtraction yields [formula] Let us seek a vector x such that [formula] Now from (31), [formula]; thus the first part of (33) provides n  p relations of the form [formula], and the second part, q similar relations: a total of n  p + q = n  s relations.
But x has n elements, so that we can find only n  s of its elements in terms of the remaining s (see [formula] an example is given in 2.5) which are arbitrary.
Thus a vector x satisfying (33) has s arbitrary elements.
Now substitute (33) into (32); the RHS vanishes, and so, therefore, does the LHS.
With all [formula][formula] positive this implies, inter alia [formula] This, taken together with (33), gives y = 0 which in view of (31) gives x = 0.
We thus have two results; x has s arbitrary elements, and x is null: these can only be reconciled if s = 0, when q = p.
If we now take [formula], a similar argument leads to the same result q = p.
Since the argument does not depend on the nature of X and L, the result is generally true, and the theorem established.
The invariant p is known as the index of positiveness, or simply the index, of A and its quadratic form.
Examples XV (a) Let [formula] then its orthogonal matrix X is [formula] which yields [formula] We build up L as the product [formula], where L1 annihilates the last three elements of the first column of A, L2 the last two elements of the second column, and L3 the last element of the third column; then [formula] Then [formula] and the diagonal elements of LA are those of D = [formula] We see that both [formula] and D have two positive and two negative elements.
It may be noted that [formula].
(b) Let us add 2 to each diagonal element of A, to give [formula].
then we know (Theorem IV) that [formula].
We leave it to the reader to construct the corresponding Lo; and to show that [formula] Thus both [formula] and [formula] have two positive, one zero, and one negative element.
Since Ao is singular, so is [formula] which is shown by the null row.
2
SOME NUMERICAL METHODS
2.1 INTRODUCTION
In this chapter we shall discuss some numerical methods for solving problems formulated in terms of matrices.
However, it must be stressed at the outset that we can do no more here than to indicate the basic principles involved, and give illustrations of a few of the almost innumerable variants of basic methods that exist.
Quot homines, tot sententiae: almost every worker in the field has his pet variation, which suits him ad his computer better than any other.
For a wider discussion, readers are referred to texts such as Bodewig (5) and Wilkinson (6).
Most of the numerical problems which arise in matrix studies (particularly those which derive from dynamical systems) lead ultimately to one or both of two basic processes: (a) the inversion of a numerical matrix; (b) the evaluation of its eigenvalues and vectors.
It will be convenient to deal with these under separate headings.
The methods themselves also divide into two broad categories: (a) direct methods, in which the solution requires is reached in a finite, predictable number of operations; (b) indirect methods, in which the solution is usually obtained by successive approximation procedures, when the number of operations is determined by the rate of convergence.
I: RECIPROCATION; LINEAR ALGEBRAIC EQUATIONS
2.2 DIRECT METHOD FOR MATRIX INVERSION
Under this heading we give three methods which are all very different in principle.
Each has variants; for each there are circumstances which favour its use.
2.2.1.
Pivotal condensation
This method is the simplest, and probably the most universally used.
Suppose we have a square matrix A = [formula] of order n for which we require to find the reciprocal R: then AR = I.
Now it is possible to write down a sequence of matrices [formula] Mr (r is usually n or n + 1, depending on the variant employed) which when used as a chain to premultiply A, condense it to the unit matrix, so that [formula] which evidently implies [formula] The numerical procedure is to operate successively on 91); given A we can write down M1 and we evaluate [formula] so that (1) becomes [formula] Knowing B we can now write down [formula] and form [formula] and so on, until R is obtained.
As we shall see, in practice it is not necessary to write down the matrices [formula] merely perform simple operations with the rows of A and I in (1), which is all the matrices [formula] do.
For ease of exposition, let us suppose n = 4.
Consider the matrix product, corresponding to (3)
[formula] In writing down [formula] we have first arbitrarily selected a non-zero element [formula] of A as a pivot (underlined) for this first step in the operation.
[formula] is then obtained from the unit matrix by replacing its second column (corresponding to the subscript 2 of [formula] by the third column of A divided through by [formula] with negative signs for the off-diagonal elements.
The third column of B is then null except for a unit in the position corresponding to [formula]
The product makes it clear that all we are doing, in fact, is to operate with the rows of A in a manner similar to that often used in the condensation of determinants.
Thus row 1 of B is now 1 of A minus [formula] times row 2 of A; row 2 of B is row 2 of A divided throughout by [formula]3; row 3 of B is row 3 of A minus [formula] times row 2 of A; and so on.
Having obtained B, we now select a pivotal element in it for the next step; the selection is arbitrary except that the element must be nonzero and must not be in the second row, which contains the unit.
For example [formula] in which [formula] has been chosen as the pivotal element.
We now choose a pivotal element in C, excluding rows 2 and 3 containing the units.
Note that [formula] and [formula] can not both be zero, or A would be singular and R would not exist; similarly for [formula] We choose a nonzero element from these four -say [formula] and then [formula] The last step is not arbitrary: we must exclude the first three rows (containing units), so that the only choice for pivot in the next step is D44, which can not be zero.
We write down [formula], evaluate E = [formula] obtaining a matrix which if the rows are rearranged by premultiplication by an obvious matrix [formula], becomes the unit matrix.
As we have said, in practice we do not need to write down M1 etc.
When we use [formula] as a premultiplier of (1) we combine the rows of I in exactly the same ways as those of A. We can therefore write the two arrays side by side and operate on rows of 2n elements.
For example, to summarise what has been done above (we refer to row i as [formula]) we use Table 1.
[formula] When the left-hand array has been condensed to I, the right hand array is R. This method has a number of variants.
First, it is possible to operate with columns, using postmultiplying matrices Mi; but this is really only transposition of operations on rows.
The variants derive mainly from the choice of pivot.
In the so-called optimal pivoting method, one chooses the element of largest modulus from among those available.
In a second variant, one chooses the element of largest modulus in each column in turn; in a third, the diagonal elements are chosen in order; and in a fourth, the diagonal elements of largest modulus are chosen in turn.
The third has the advantage that no final rearrangement of rows is required; on the other hand, it fails if a pivot becomes zero, or even very small.
The third variant is usually described as &quot; diagonal pivoting without row interchange &quot;, the fourth as &quot; diagonal pivoting with pivotal strategy &quot;.
It is to be observed that the determinant of A is readily found from the condensation procedure.
The determinant of [formula] is [formula], etc.
It follows that the determinant of A is [formula], i.e. the product of the pivotal elements chosen, multiplied by the determinant of [formula], which rearranges the rows, and so has the value [formula], according as to whether the number of interchanges is even or odd.
Example 1
We exemplify these procedures using the matrix [formula] This matrix, in fact, has the determinant 100; since its elements are integers, the elements of the reciprocal will be terminating decimals.
In the intermediate steps, however, the numbers are incommensurable: for brevity of exposition we give four decimal places only, though of course a computer will normally employ a much greater number.
(i) Optimal pivoting
In Table 2, the last array on the right is the required reciprocal.
Note that, in the rearrangement of the last step, we crossed three rows in bringing [formula] to the top; then the new second row [formula] crossed two rows in going to the bottom.
Since the total number of crossings, 5 is odd, [formula] has the determinant -1; accordingly [formula] [formula] [formula] Coincidentally, this example also illustrates he second variant: it is adventitious that the largest element among those available occurs in the first, second, third and fourth column in succession.
There is thus no difference in the working.
However, in the optimum method a computer is required to scan all elements available, i.e. [formula] in successive steps, while in the second variant only m, (n  1),..., 1 need scanning.
In this example, therefore, the second variant has the advantage; in a matrix of large order, the scanning process can take considerable computer time.
No scanning is needed in the third variant, where the diagonal elements are prescribed.
(ii) Diagonal pivoting
We now treat the same matrix by the third variant (see Table 3).
For this example, this variant presents no difficulty.
Also [formula] as before.
The student is invited to effect this reciprocation by means of the fourth variant, in which the largest diagonal element available is chosen as the pivot at each stage; it begins with the bottom right-hand element.
2.2.2 The method of submatrices
This method has nothing in common with the pivotal method, but has great advantages in certain circumstances.
(a) Suppose we have a computer which will accommodate a matrix of order m, but we require to reciprocate a larger matrix A of order n where [formula].
(b) Suppose we have a matrix B of which we know the reciprocal [formula]; B is now bordered, so that it becomes a submatrix of a larger matrix A, and we require A-1.
The method of submatrices is particularly useful in solving both these problems.
Let A be partitioned in the form [formula] Here A is of order n, B of order m, E of order n  m; C and D are in general rectangular.
Let the reciprocal of A be similarly partitioned: [formula] Then the equations [formula] provide relations from which, in general, [formula] [formula], may be explicitly found.
Various formulae are possible, but the most advantageous from the computational standpoint appear to be the following.
Write [formula] Then [formula] as may be checked by evaluating [formula] It is to be noted that, although these formulae involve a number of multiplications, only two reciprocations, B-1 (of order m) and [formula] (of order n  m) are involved.
Problem (a)
It is clear that the formulae just established provide a method of solution for problem (a) since [formula] and the computer can reciprocate both B and [formula] and perform the required multiplications.
Example 1
Although it is very small, we may use the matrix A of Equation (2.2.1.4) to illustrate the principles involved.
We shall here partition it into four square submatrices, each of order 2.
In this simple illustrative case we avoid decimals.
Thus [formula] [formula] We are now able to evaluate the remaining submatrices of the reciprocal as [formula] Accordingly, the reciprocal is [formula] in agreement with the result found earlier.
Problem (b)
It is also clear that Equations (3) and (4) provide a solution for this problem.
We are given [formula] and we border B as in (1) and apply our solution to obtain [formula] The working is then exactly as in the example just given.
The formulae are, however, rendered much simpler if we have only line bordering, i.e. C is a column, D a row, and E a scalar.
Then X is a column, Y a raw, and [formula] a scalar.
It follows that we can provide the answer to a third problem, (c): the successive evaluation of the reciprocals of the leading minors of a matrix A until [formula] is achieved.
Example 2
Again, we choose the matrix A of Equation (2.2.1.4): (i) The leading minor of order 1, b1, is the element 6: reciprocal 1/6. (ii) Using (i), we apply the formulae (3), (4) to find the reciprocal of the leading minor of order 2, viz: [formula] Here [formula] and we deduce [formula] (iii) We now use this result to examine the third order minor [formula] Then [formula] similarly, [formula] Also [formula] Note that, in Equations (4) which we now evaluate, [formula] is a matrix of rank 1.
We find [formula] (iv) Using [formula], we proceed to obtain the reciprocal of [formula].
We leave this step to the reader, but observe that [formula]
Since an element in the reciprocal of a matrix M is defined as the ratio of the cofactor of the corresponding element in M to the determinant [formula], it follows that, for example, [formula] and hence [formula] In this example, therefore, [formula]
2.2.3 Column (or row) building
This method makes use of Theorem IX of [formula] which states that if we have a square matrix Ao of order n, and know [formula], then by a simple calculation we can find the reciprocal [formula] of a new matrix [formula], where [formula] is a unit rank matrix, also of order n, [formula] being an arbitrary column and row respectively.
A second application of the theorem enables us to find [formula], where [formula] and so on, until we find [formula], where [formula] In this present context, suppose we have a matrix A of order n and require to find its reciprocal.
Then it is possible, using (1), so to choose, [formula], etc. that Ao is transformed into A, when a last simple calculation gives [formula] The method has many variants: we can build A column by column, or row by row, or column and row simultaneously.
We give here what seems to be the simplest approach.
We choose [formula] (or possibly a scalar matrix [formula]).
Then, if [formula] is the isolating row vector, we form [formula] Here [formula] is the first column of A except that the leading element [formula] is diminished by 1; thus the first column of [formula] is identical with that of A, while the (n  1) remaining columns are those of I. By Theorem IX of 1.22 [formula] Next we evaluate (this step is not quite so simple) [formula] so that [formula] Here [formula] is the second column of A but with the second element reduced by 1; [formula] and [formula] is the second element of [formula] Further steps follow this pattern and are obvious.
We may, however, observe that (4) may be written [formula] i.e. as the sum of [formula] and a rank 1 matrix of which the rows are proportional to the second row of [formula].
Example 1
We choose the same numerical matrix as before (Equation (2.2.1.4)), viz: [formula] so that [formula] [formula]
2.2.4 Triangulation
We have already seen, in [formula] that reciprocation of a triangular matrix can be achieved by a simple process of back-substitution.
Accordingly, if we first reduce a matrix A to triangular form we can then proceed to determine its reciprocal.
The usual reduction to triangular form is by a pivotal condensation similar to that described in 2.2.1, except that we only remove elements below the diagonal.
We may best illustrate this with out usual example (see Table 1).
What we have done is to keep [formula] but use it to eliminate the leading elements of [formula] and so on, reducing the number of rows by 1 at each step.
We have now, in effect, achieved the equation [formula] This has been obtained by triangulation without row interchange.
If we now proceed to the process of back-substitution, this is in effect operating again with rows, and all it does is to compete the process which was done continuously in the pivotal condensation of 2.2.1.
There is thus no advantage at all in he intermediate triangulation step.
However, much attention has been given in the literature to inversion of triangular matrices by methods other than back-substitution, particularly iterative methods.
We illustrate below one direct method.
Triangulation has many important uses.
When we reduce a matrix A to triangular form, we write it as [formula].
We may first note that the product of the diagonal elements of the leading matrix in (1) is evidently [formula].
If we divide each row through by its diagonal element, (1) becomes [formula] The leading matrix here is of the form I + C, where C is null except for the elements above the diagonal.
Now, provided [formula] as r increases, [formula] as may be checked by premultiplication by I + C. In this example [formula] and on use of (3) we obtain from a single multiplication [formula] Premultiplication of(2) by (4) yields finally [formula] in agreement with results given earlier.
2.3 ITERATIVE METHODS
These methods are well suited to computers and often have the advantage that, should an error occur, at worst it lengthens the computations a little.
2.3.1 Improvement of an approximate reciprocal
This is an important topic; it often happens that an approximate reciprocal of a matrix A is known: perhaps from a rough calculation, or one in which an error has been made, or even that belonging to a neighbour matrix of A. Refinement of this approximate reciprocal is then required.
The usual procedure is as follows.
Let R be the exact reciprocal and [formula] the approximation.
Write [formula] Then [formula] Premultiply by [formula] then since [formula] [formula] Thus [formula] is a new approximation to R, and we may repeat the cycle as required to obtain R, subject to convergence of the method.
The conditions for this are readily established.
From (2) we find.
[formula] and further steps give [formula] and so on.
Thus, provided [formula] as r increases, the method will converge very rapidly.
This will normally be the case of R1 is a reasonably good approximation, when the elements of [formula] will be small.
At each step the method requires the multiplication of two matrices such as AR1, the subtraction of the result from 2I, and then the second multiplication, e.g. [formula] to obtain the next approximation.
An alternative procedure which achieves the same result but is more convenient computationally is the following.
Write [formula] Then [formula] Inversion of this and premultiplication of the result by R1 gives, provided [formula] as r increases.
[formula] It is readily checked, in view of (3) et seq., that he product of the first two terms in (5) is R2, of the first three is R3, and so on.
The number of matrix multiplications in this procedure is the same as before; but the squaring of [formula] etc. is usually relatively easy since they become rapidly smaller in practice.
Indeed, if, for example in (5), [formula] and higher powers are sensibly null to the order of accuracy required, we can write [formula] saving one multiplication.
In practice, since the numbers in [formula] and I become progressively and rapidly more disparate, it is better to deal with them separately; i.e. the operations in (5) are carried on as follows; we are given A and R1: (i) evaluate [formula] (ii) evaluation [formula]... until to the order of accuracy required a power of E1 is sensibly null.
(ii) evaluate [formula] and add R 1 to get R 2, (iv) evaluate [formula] and add [formula] to get [formula], (v) evaluate [formula] and add [formula] to get [formula], and so on.
Normally, very few steps are needed.
Example 1
Suppose we are given [formula] Then A is the matrix of our previous examples, and R1 an approximate reciprocal, given to three places of decimals.
It is required to find the reciprocal R of A, correct to six places of decimals.
Then [formula] where the figures are exact.
[formula] is thus exact to six decimal places, viz: [formula] but [formula] would require 12 decimal places.
We retain only eight: [formula] an to this order of accuracy [formula] is null.
Hence [formula] Since [formula] is null, we can apply (6), when [formula] where we have retained eight decimal places.
Finally, to six decimal places [formula]
2.3.1 General Iteration
The most general iteration procedure appears to be the following.
We require to solve RA = I for R. Write the known matrix A as A = B  C, where B is a matrix having a known (or easily found) reciprocal.
Then the problem can be written [formula] where we have written [formula] There are various iterative methods for solving (2).
For example, we may write it as [formula] and use the regression formula [formula] so that, for example, if Ro = 0, we should obtain [formula] and so on.
Evidently, these are successive steps in the solution of (2), which is valid of [formula] as r increases.
[formula] A rather more rapid method is to write (5) as [formula] For the simple reciprocation of a matrix A these methods hardly compete with pivotal condensation; however, they do have a place in the parallel problem of solution of linear algebraic equations.
It must be stressed, however, that the choice of B must be such at [formula] has eigenvalues with moduli all less than unity; Sylvester's expansion then shows that [formula] as r increases.
There are certain physical problems yielding matrices for which this can be guaranteed; but in general some trial and error is involved.
It should be noted that (2) can be written alternatively as [formula] leading to [formula] It is important to observe that this general iteration procedure is really exactly the same as the method given in 2.3.1 for improvement of an approximate reciprocal.
The first of Equations (4) shows that [formula] is an approximate reciprocal, and then Equations (2.3.1.5) and (6) are the same, provided E1 and E are the same.
Now, if B-1 is the same as R1, [formula] so that the methods are effectively the same.
This result shows that, although in (1) no restriction is placed on B, it must in practice be a reasonable approximation to [formula] if the method is to converge.
Example 1
We illustrate Equation (6) here.
Let [formula] Here we have chosen B as the lower triangular portion of A, and C is of course B  A. We are given that [formula] To four places of decimals we now find [formula] and to this order of accuracy, [formula] is null.
Then, successively (note that these are not the steps of (4)).
[formula] and [formula] is in fact the exact reciprocal of A.
2.3.3 Special iteration procedures
In Equation (2.3.2.1) the only restriction placed on B was that it should have a known or easily-found reciprocal.
We now discuss two particular choices for B.
(a) Seidel iteration.
Here B is chosen to be the lower triangular part of A, including the diagonal, with all superdiagonal elements null.
B is then reciprocated as in 1.9(8) an the methods of the last paragraph applied.
The example given there is in fact an illustration of Seidel iteration.
(b) Simple or diagonal iteration.
Here B is chosen to be the diagonal matrix of the diagonal elements of A. Its reciprocal can therefore be written down.
It is mostly used when A is a &quot; sparse &quot; matrix, with elements dominated by the diagonal; B-1 is then a reasonable approximation to R. If this is not the case, the method may not converge, or do so very slowly.
For example, if we take the matrix A of the last example, an choose [formula] then [formula] This matrix has two real eigenvalues, 0.319 and 0.597, and a pair of complex eigenvalues of modulus 0.999.
The powers of E therefore converge very slowly indeed.
A better illustration follows.
Example 1
Let [formula] Then, if we retain eight decimal places [formula] Hence, if [formula] and so on, we find as successive approximations to [formula] correct to six decimal places, [formula]
2.4.
SPECIAL TYPES OF MATRIX
Certain matrices have properties which make reciprocation rather easier than would otherwise be the case.
We give a few examples.
2.4.1 Symmetric matrices
The reciprocal R of a symmetric matrix is itself symmetric.
For if A is symmetric and we transpose AR = I to get [formula] then evidently [formula] (see also 1.22, Theorem I, Corollary).
[formula] It follows that we do not need to compute all the [formula] elements of R; n (n + 1) /2 elements, comprised in the diagonal and below, are all that are required.
Example 1
We give in Table 1 a scheme for pivotal condensation of the matrix just examined in Example 2.3.3.1.
For convenience, the reciprocal has been multiplied by 103; we work to six decimal places on the left, three on the right.
If the last four rows of Table 1 are taken in reverse order, the required reciprocal is obtained.
It is to be noted that the starred numbers in Table 1 do not need to be calculated, though it is convenient to write them down.
Provided that appropriate proportions of [formula] are subtracted from [formula] as they stand to give zeros in the first column, the array of numbers on the left in [formula] can be written down; when the rest of [formula] is found, the third element of [formula] can be inserted, and so on.
Also, when [formula] is reached, this gives the last row of R and hence also the last column, so that the eighth elements in [formula] can be written down, and so on.
However, in practice, the programming required to avoid these calculations could be more time-consuming than the straight pivotal condensation, when the symmetry of the final result gives a check on the calculations.
The procedure in Table 1 also exemplifies triangulation without row interchange.
The product of the underlines elements in the table gives [formula].
2.4.2 Positive definite (symmetric) matrix
The method to be described is usually attributed to Choleski.
It is a simple matter, which we leave to the reader, to show that if A is real, symmetric and positive definite, then it can be written as the product BTB where B is real and triangular.
Then if [formula] and since B is triangular, [formula] is readily found.
We give a simple example.
Example 1
Let [formula] The leading element gives [formula] (use of  10 makes no difference).
Then ab = 30, b = 3; ac = 10, c = 1; ad = -10, d = -1.
Next [formula] and so on.
In this way it is quickly established that [formula] Inversion of B as in 1.9(8) then yields [formula] and then [formula] It may be of interest to note that if x = { p, q, r, s } is any vector of real numbers, not all zero, then the quadratic form [formula] may be written (see also 1.22, Theorem XIII) [formula] where [formula] so that, where the second term does not involve p, the third p, q, and the fourth p, q, r, [formula] and is always positive under the conditions given.
If A is not pos. def., the elements of B are in general complex.
2.4.3 Persymmetric matrix
The reciprocal of a persymmetric matrix is not in general also persymmetric; e.g. [formula] A persymmetric matrix can be treated as symmetric for the purpose of reciprocation, but otherwise it is not special.
2.4.4 Centrosymmetric matrix
We shall consider here only matrices of even order; the odd-order case is quite straightforward, but is algebraically more complicated.
A centrosymmetric matrix A is characterise by A = JAJ, where J is the reversing matrix.
Inversion of this gives [formula] since J2 = I; the reciprocal of A is therefore also centrosymmetric, as are the integral powers of A. If A is of order 2m, it can be partitioned into four submatrices, each of order m, thus [formula] other representations are possible, but this appears to be the most convenient.
If (2) is premultiplied and postmultiplied by [formula] the result is A as required.
We can write the reciprocal of A as [formula] and then the product of (2) and (3) yields (twice) [formula] These may be solved if either P or Q is non-singular; we assume P-1 exists, and we write R for P-1Q.
Then from (4) we obtain [formula] The reciprocation of A may thus be effected through two reciprocations of order m, instead of the much more laborious single reciprocation of order 2m.
Example 1
Consider [formula] Then, in the above nomenclature [formula] Accordingly, [formula] A caution must be added: the method is not always viable.
Thus, for example, [formula] but this result can not be obtained by use of (4) since by inspection the submatrices of order 2 are all singular.
2.5 LINEAR ALGEBRAIC EQUATIONS
A set of linear algebraic equations (often called simultaneous equations) has the form Ax = p.
Here A is, in general, a rectangular matrix of order m  n with given elements, p is a given column of m elements, and x a column of n unknowns; it is required to find x.
Suppose first that [formula].
Then we have more equations than unknowns.
If (1) is soluble, there must be n independent equations; if these are written first, we can partition (1) in the form [formula] where B is square and non-singular.
Then [formula] If s does not satisfy (2) the set of equations is inconsistent, and (1) does not have a valid solution.
If, however, the equations are consistent, then the solution (2) for x uses only n of them; the remainder are superfluous.
Now let[formula].
In this case, we have fewer equations than unknowns; we now partition (1) in the form (B, C) { y, z } = p, where B is square, of order m, and y is a column of m quantities.
We now obtain the partial solution for x [formula] in which we determine m of the unknowns x in terms of B, C, p and the remaining m  n unknowns z.
This is the parametric solution of 1.10.
Example 1
[formula] Consider as partitioned, the solution is (see (2)) [formula] and [formula] so that the equations are consistent; if they are taken any two at a time, the three cases all yield the same answer.
However, had the last element in p been other than 4, the three pairs of equations would have given three different answers.
Example 2
[formula].
Here, a simple example is [formula] which yields [formula] In Examples 1 and 2 the main numerical problem is the solution of BX = r, By = p  Cz, where B is square.
We now confine our attention to the case where A is square and non-singular; this is obviously germane to the above solutions.
2.5.1 Direct Method
Our problem is to solve Ax = p, for x; here A is a given square and non-singular matrix of order n, p is a given column of n quantities, and x a column of n unknowns.
Formally, the solution is obvious: x = A-1p.
We can therefore obtain our solution by inverting, using any of the methods of 2.2, 2.3, and postmultiplying the inverse by p.
In some problems we require to obtain x for each of a set of values for p; e.g. the elements of p may be functions of an independent variable, and we require x for each of a number of values of the variable.
If this number is high  in particular if it exceeds n  then labour is saved by inverting A and using (2).
If, on the other hand, the number is small, labour is saved as follows.
We use the approach of 2.2.1; but instead of operating on (2.2.1.1) with the matrices Mi, we operate directly on (1).
Evidently, we then progressively approach the solution (2).
The computations, by simple operations on rows, are effected in exactly the same way.
Indeed, reciprocation is really only the solution of a particular set of linear algebraic equations, in which the vectors p are successively [formula] For a single set, in pace of e1 we use the single vector p.
Example 1
Suppose we require to solve [formula] The square matrix used here is that employed for the Examples of 2.2; we choose diagonal pivoting without row interchange.
The working of the left-hand array is thus identical with that of Example 2.2.2.1.
See Table 1.
The last four figures in the right-hand array are given correct to three decimal places and are in fact the exact solution.
2.5.2 Iterative methods
We exemplify the general iteration procedure of 2.3.2; we write B  C for A, where the reciprocal of B is known or easily obtained.
Then Equation (2.5.1.1) can be written [formula] so that, if F is written for B-1C, the solution is given by [formula] say.
As before, this may be developed in either of two ways, provided [formula] as r increases: x = q + Fx, suggesting the regression formula [formula] or [formula] Though the computations may be different, these two formulae are equivalent; for if xo = 0 the regression formula gives [formula] and so on.
We illustrate (3) below.
Example 1
It is required to solve [formula] The square matrix has been used in Example 2.3.2.1.
We again choose B to be A except that the elements to the right of the principal diagonal are all zero: then, as before [formula] [formula] Note that here F = B-1C: in the previous example, E = CB-1.
We now construct the iteration table (see Table 1), beginning with x1 = q and applying (3).
Since, in view of the fact that F has two null columns, the first two elements of x are always multiplied by zeros, it is not necessary to compute them until the iterations are complete.
At the beginning of the iteration, we have retained only two decimal places: later three, then four, until to this order of accuracy the iteration repeats.
Finally, the first two elements are computed, when the last column (here x10) is the solution required.
It may be observed that equations of the form x = q + Fx arise naturally in some dynamical or quasi-static problems, e.g. the distortion of a structure under applied load, when the load is varied by the distortion.
2.5.3 Choleski's method for positive definite matrices
Let us now suppose that in 2.5.1 A is pos. def.
Then as in 2.4.2 we can write A as [formula] where B is triangular.
The equations for solution are now [formula] Write y for Bx; then (1) can be solved in two steps: [formula] and then Bx = y for x.
It may be noted that a set of equations Ax = p, where A is general, may be cast in the above form by premultiplication by AT: [formula] However, this does not appear to offer any computational advantage.
Resolution of [formula] into [formula] implies A = CB where C is an orthogonal matrix, so that [formula].
Again, however, this does not appear to offer any worthwhile computational short-cut.
Example 1
Suppose we wish to solve [formula] As in 2.4.2, we can quickly resolve the square matrix into [formula] A self-explanatory scheme for computation is shown in Table 1.
Here BT, p are written at the head of the left-hand array; by obvious row combinations we find y at the bottom.
This and B are inserted at the top of the right-hand array, when again obvious row combinations give the required solution at the bottom right: in summary [formula]
2.5.4 Least squares
In 2.5 we noted that if we have more equations than unknowns, the equations must be consistent, or a unique solution does not exist.
Nevertheless, especially in experimental work, cases often arise where the number of equations exceeds the number of unknowns.
For example, suppose we have a measurable quantity f which we have reason to suppose is a polynomial function of an independent variable t: [formula] and we wish to find the m coefficients [formula].
Measurement of f at each of m values of t would provide m equations for the m unknowns; however, since f is measured and so is inexact, it may be thought better to measure f at n values of t, where [formula] in order to minimise the effects of experimental error.
How do we proceed to find optimum values of the coefficients [formula]?
The &quot; least squares &quot; solution of this problem is due to Gauss.
Let the n equation in m unknowns [formula] be written Ax - p = e, where A is of the order (n  m), x is the column of m unknowns [formula], p is the column of n measures values of f, and e is a column of n errors.
We are given A and p; e will vary with x.
Gauss' proposition is that x will have its optimum value when the sum of the squares of the errors, ie.e [formula] is a minimum.
Now [formula] and for this to be a minimum, its differential with respect to x (see 1.11) must vanish; i.e. [formula] which is our original set of Equations (2) with e = 0, premultiplied by AT; it gives m equations for the m unknowns.
Example 1
Suppose we expect the relation (1) to be [formula] and suppose further that we have the following experimental table: [formula] Then in view of (5) our equations for solutions may be written [formula] Before we proceed to the least squares solution, let us see what results would be obtained if we solved these equations three at a time, with e taken to be zero for each set of three.
We should get [formula] These are all very different.
Moreover, if we adopted the first solution and applied it throughout, e would become (the first three elements have been assumed zero) [formula] which is clearly quite unacceptable for the later measurements.
However, if we premultiply (6) by AT as in (4) we get [formula] and the solution, by inspection, is x = { 1, -2,1 }, i.e. [formula] If we now return to (6) and use this solution, we find [formula] which compares very favourably with the result given earlier., /div4.
2.5.5 Least squares: weighted errors
In the basic Gaussian method just described, it has been tacitly assumed that all errors are equally likely.
The method is readily adapted, however, to cases where some weighting of errors is desirable: it is merely a case of premultiplying (2.5.4.2) by an appropriate diagonal matrix.
Example 1
Suppose we wish to put a straight line [formula] through the following set of experimental points: [formula] Suppose further that, in view of the experimental method, we may expect errors in f to be proportional to t.
Let us first examine the straight Gaussian solution.
The equations can clearly be written [formula] Premultiplication by AT yields [formula] so that the optimum lime, if all errors are equally likely, is f = 0.57t + 1.80.
However, we are given that errors are likely to be proportional to t.
To make them equally likely, we must multiply the first of Equations (1), at t = 1, by 6; the second, at t = 2, by 3; the third, at t = 3, by 2, etc; i.e. we premultiply (1) by Diag(6,3,2,1,5,1,2,1).
The result is [formula] and premultiplication of this by [formula] yields [formula] to two decimal places.
Hence in this case, the optimum line is f = 0.50t + 2.01.
For comparison, the errors e in the two solutions (2) and (3) are (the lines intersect at t =3) [formula] and it is clear that in the second case the errors increase markedly with t, as required.
II: EIGENVALUES AND VECTORS
2.6 THE CHARACTERISTIC EQUATION
The n eigenvalues of a square matrix A = (Aij) of order n are given (see 1.16) by the roots of the characteristic equation [formula] which, in expanded form, can be written [formula] In theory, therefore, the calculation of the eigenvalues of A is straightforward: we form the characteristic determinant in (1), expand it to the form (2), and solve this by any suitable means to obtain [formula].
Having obtained the eigenvalues, can readily find the corresponding eigenvectors.
If [formula]r is a known eigenvalue (which we assume here to be unrepeated) and [formula] the corresponding column and row eigenvectors respectively, then [formula] Since [formula] are each arbitrary to a scalar multiplier, we can in each vector make a suitable element unity; then Equations (3) provide in each case n linear algebraic equations for the (n  1) unknown elements: one equation is superfluous, or can be used as a check.
For example, if we make the last element of [formula] unity, and partition ([formula]) to isolate its last column and row, we can write (3) as [formula] which give [formula] Equations (5) can be solved by any of the methods of 2.5 (the second would be transposed to [formula], and then [formula] Unfortunately, so far as the expansion of (1) to (2) is concerned, theory and practice are at odds.
With [formula] general, the labour involved in the expansion is quite prohibitive even for small values of n, and indirect methods are always used in practice.
We describe two such methods below; first, however, we observe that, in (2), we can readily obtain the two coefficients p1, pn, since (see 1.22, Theorem II, III) [formula] The evaluation of a numerical determinant, even of high order, is very quickly accomplished on any modern computer by reduction to triangular form; thus we can find p1 and pn without difficulty.
It remains, however, to find p2, p3,..., pn-1.
2.6.1 Location method
In Equation (2.6.1) we arbitrarily assign (n  2) different values to [formula] (other than [formula] = 0, which we have in effect used to determine pn) and then evaluate the (numerical) determinant for each of these.
We then obtain n  2 linear algebraic equations for the n  2 unknowns pi.
A caution should be added: most the arbitrary values of [formula] should preferably lie among the zeros of [formula] ([formula]; otherwise inordinately high accuracy may be needed.
At this stage, the zeros of [formula] ([formula]) are now known, but their approximate locations may often be inferred from their sum and product as given by (2.6.6) and (2.6.7).
Example 1
Let [formula] The sum of the eigenvalues is thus [formula] and their product 3906.25 (=p4).
Their arithmetic mean is thus about 11 an the fourth root of [formula] about 8.
These suggest that we might use [formula] = 20, [formula] = 10 in (2.6.1); we have already used [formula] = 0 to obtain [formula].
It is readily found, by condensing the numerical determinants, [formula] Now we know that [formula] or [formula] Insertion of [formula] = 20, [formula] = 10 in (3) together with the numerical values of [formula] (gl }) from (2) yields the two equations [formula] which give p2 = 606.25, p3 = 2812.5.
Hence [formula] on factorisation.
The eigenvalues of A are thus 25, 12.5, 5, 2.5.
We illustrate the calculation of the eigenvector [formula] corresponding to the eigenvalue 12.5.
Then [formula] giving [formula] of which the solution is (any method from 2.5 may be used) [formula] and it is readily checked that these also satisfy the fourth equation.
2.6.2 Iteration method
In view of the Cayley-Hamilton theorem, A satisfies its own characteristic equation; i.e. [formula] Postmultiply this by any arbitrary column co and write [formula].
Then a computer will rapidly evaluate the successive products [formula] and so on.
Accordingly (1) yields [formula] This provided n linear algebraic equations for the n unknowns pi.
As before, we can find [formula] independently if we so wish.
It must be added, however, that this method is not suitable when n is large.
As we shall see later, the columns ci usually tend to proportionality as i increases, so that Equations (2) become increasingly ill-conditioned; also the numbers involved can assume widely different magnitudes.
Example 1
We use the matrix A of (2.6.1.1) with co = e4.
It is found that successive columns ci are then [formula] Accordingly, (2) can be written [formula] in agreement with (2.6.1.4).
With the particular choice of co, p4 enters into the last equation only.
Accordingly, if we calculate p1, p4 from the trace and determinant of A, we can discard the last equation and substitute for p1 in the remainder, when (3) reduces to [formula] Any two of these (consistent) equations give p2, p3 their values in (4).
2.7 POWER METHODS
The most commonly used device for finding the eigenvalues and vectors of a matrix A is the power method, or one of its many variants, in which A is effectively raised to a high power.
We illustrate some aspects of the basic principle.
2.7.1 A dominant eigenvalue
Sylvester's expansion for A is (see 1.18), in its simplest form, [formula] and in view of the properties of the constituent matrices [formula], [formula] Since the order of the terms on the right is immaterial, there is no loss of generality i supposing[formula]1 to be the eigenvalue of greatest modulus.
If for the moment we assume it to be real and unrepeated, then all other eigenvalues have smaller moduli.
Thus if r is increased sufficiently, say to a value s, the first term on the right of (2) dominates the rest, which become relatively insignificant.
Then, in the limit, [formula] By use of (3) we thus find both [formula]1 and [formula].
We could in theory find them from As only, since [formula], but there is a practical difficulty.
Unless the modulus of [formula]1 is close to unity, the numbers appearing in Ar (r large) are either inordinately large or small; in practice therefore it is usual to find not Ar but a matrix proportional to it: at each stage in the power evaluation, a homologous element is reduced to unity.
Example 1
Let A be the matrix in Equation (2.6.1.1).
Then, if we reduce the bottom right-hand element to unity, we have [formula] We now evaluate B2 and divide through by the bottom right-hand element to obtain [formula] Evaluation of F2/2.4997 repeats F except for occasional small differences in the fourth decimal place.
Now, if we round off to three decimal places, we find that F can be written as the unit rank matrix, proportional to [formula] [formula] If we now revert to A and postmultiply by [formula] [formula] Thus, [formula]1 = 25.
Also, the inner product [formula] is [formula] so that normalising to make [formula] unity: Note that [formula] so that A has been raised fairly rapidly to a high power.
An alternative.
There is, however, an alternative device which is in practice more commonly used; this is repeated premultiplication by A of an arbitrary column [formula] as in 2.6.2.
We successively evaluate [formula] etc.
If we postmultiply (2) by co and write [formula] (ki is of course a scalar) we obtain [formula] Accordingly, provided that co is not such that k1 ([formula]) vanishes, the first term on the right dominates the rest as r increases, until in the limit when r = s [formula]
Example 2
We exemplify this using the same matrix A as above; also, we adopt the device of reducing at each stage a homologous element (in this case the bottom element) to unity, beginning with co = e4.
The first step is [formula] Proceeding in this way, we find successive columns as in Table 1.
Also given are the dividing factors at each step.
Since c16 repeats c15, they give x1; also the dividing factor 25 is [formula]1.
To complete the solution, we now require to find [formula]; this could be done by repeating the iterative procedure, evaluating [formula], where [formula] is an arbitrary row, or more usually, by the device given in (2.6.4), viz: [formula] giving [formula] We leave it to the reader to show that in this example, (7) leads to [formula] or to a scalar multiple of it.
We normalise [formula] in the usual way.
We may note in passing that (as they should) the columns [formula] and [formula] of Table 1 agree with the last columns of B, C, D, E, F (subject to rounding-off errors in the last place of decimals).
This leads naturally to a comparison of the two methods.
There are two reasons why the second method is commonly used.
First, if an error (even of rounding-off only) is made during the successive squaring process, it is multiplied by itself [formula] subsequently, an persists through the calculations; in the second method, if an error is made in calculating ci, this only amounts to choosing a new arbitrary column, an at worst may prolong the calculations.
But, mainly, the first method usually involves more work than the second.
It is difficult to quantify &quot; work &quot; exactly, but if the number of individual multiplications is taken as a guide, then, if A is of order n, each of the n2 elements of [formula] involves n multiplications: i.e. n3 in all.
If r squaring processes are needed, we perform rn3 multiplications, yielding [formula], r must be an integer.
In the column process, each step involves n2 multiplications; hence, to reach the same accuracy, between [formula] and [formula] multiplications are needed.
Now, provided the eigenvalues are reasonably separated, column iterations usually number between 15 and 50.
Suppose, in a particular case, 25 are required.
Then the &quot; work &quot; is measured as [formula] From Table 2, r must be 5, with the work measure for squaring 5n3.
The ratio is n/5; so if [formula] the work involved in squaring is greater.
If, say, n is 100, the work is 20 times greater.
[formula]
2.7.2 Subdominant eigenvalues: deflation
Again, for the moment we assume the eigenvalues of A to be real and unrepeated; and we assume that in (2.7.1.1) they appear in descending order of modulus.
If, as in 2.7.1 we have found [formula], then Equation (2.7.1.1) can be written [formula] The known matrix on the left now has all the eigenvalues and vectors of A except the first set, and it can be treated as in 2.7.1.
The process of reducing A in this way is known as &quot; deflation &quot;.
Evidently the eigenvalue [formula]1 has been replaced by zero, so that A1 is singular.
Having evaluated A1, we can again apply column iteration.
However, we need not begin with an arbitrary column; we can readily find an appropriate column to begin with, which avoids many iteration steps.
An example will make this clear.
Example 1
We continue Example 2.7.1.1.
The iterations have shown that [formula]1 = 25, while x1 is proportional to{ 1,2,3,5 } and [formula] to (1,7,5,4).
Normalising these, we find [formula] Note that [formula] vanishes when premultiplied by [formula].
We now revert to the iterations for [formula]1 an observe that, for example, c9 of Table 2.7.1.1 is mostly composed of x1.
The remainder will clearly be mostly [formula].
Since x1 is orthogonal to [formula], the part of c9 due to x1 will disappear when it is premultiplied by A1 (see 1))).
Thus we find [formula] and dividing by the last element, we find as a starting column co for iteration with A1 [formula] The iterations give us successive columns as in Table 1.
Column c9 repeats c8, so that [formula]2 = 12.5 and [formula] is proportional to{ -0.5, = -0.5,0,1 }.
As before, we now find that [formula] is proportional to (-1, -2,0,1).
Normalising, we find [formula] We proceed to the third eigenvalue; we further deflate A by subtracting the unit rank matrix (3) from A1: [formula] Note that [formula] = 7.5, while [formula] is in fact doubly degenerate; it vanishes when premultiplied by [formula] or [formula].
As before, we choose a column from Table 1  say c3 = { -0.4978, -0.4992, -0.0015,1 }  and apply it as a postmultiplier to [formula].
The result is a column proportional to{ 0.981, 0.0095,  1.0095, 1 }.
We leave it to the reader to show that this leads rapidly to [formula]3 = 5 with x3 proportional to{ 1,0,  1,1 } an that [formula] is then found to be proportional to (27, -11, -15.8).
We normalise this to obtain [formula] an now further deflate A to obtain A3; in this case, as will be seen, no iteration is required to obtain a solution: [formula] In obtaining the last form of(5) we have noted (a) that [formula]; since the other three eigenvalues have been replaced by zeros, [formula]4 = 2.5; (b) A3 is, by inspection, of unit rank an so, with [formula]4 extracted, can be written in the form (5).
This completes the solution.
With the four terms of the Sylvester expansion now given numerically in (2), (3), (4) and (5), the whole solution can be expressed compendiously as [formula]
2.7.3 Subdominant eigenvalues: sweeping
Although the delation procedure discussed in 2.7.2 is the most straightforward possible method, other devices are often useful and sometimes preferable.
We now discuss &quot; sweeping &quot;, which takes two forms.
(i) First method
Instead of operating with a deflated matrix, we operate on the original matrix A (or a part of it) with a column vector from which all contributions from x1 have been &quot; swept &quot; away.
This is valuable, for example, when (as is often the case in dynamical problems) A is sparse.
Deflation destroys this characteristic; the sweeping method under discussion does not.
With r = 0, Equation (2.7.1.5) becomes [formula] Hence if we choose co to make k1 zero, it will contain no contribution from x1, and, as r increases sufficiently to a value s, (2.7.1.5) will tend to [formula] The condition [formula] means that we can express one element of co in terms of the remaining n  1.
In theory, if co is chosen in this way, the iterations will in due course yield (2) without further attention.
In practice, rounding-off errors soon produces inaccuracies which re-introduce small proportions of x1, which tend to grow relatively rapidly.
It is therefore best to apply the criterion [formula] at each step; i.e. to calculate one element of cr from this criterion when the others have been found.
This means that we can ignore one row of A; we exemplify this below.
As before, we need not be completely arbitrary i our choice of co (though it must satisfy [formula]) from (1), a column free of x1 is [formula] with co arbitrary.
But we can with advantage choose co to be a column in the previous iteration to find[formula]1 etc.: this will contain mostly x1, some [formula], but little else; then [formula] eliminates the contribution from x1.
Example 1
We use the same example as before, restating for convenience what we have already found.
[formula] Note that [formula] have been normalised to give [formula].
Now any postmultiplying column say{ a, b, c, 1 }, must satisfy [formula] In performing our iterations, therefore, we may ignore completely the first row of A; we use the last three rows to find the corresponding elements in a column, and then apply(3) to find the top element.
For our initial column co, we again choose the column c9 from Table 2.7.1.1, in the iteration to find [formula]1 etc.; thus for our present purpose [formula] Then [formula] and [formula] As it should, this column vanishes when premultiplied by [formula].
We divide throughout by the last element to obtain as our new starting column [formula] then the iterations, using (3) at each step, begin [formula] We postmultiply A (ignoring its top row) by the column (4) to get the column of three numbers on the right; divide by the last element to get the bottom three elements in the next column, and find the top element from (3).
In this way, we obtain the successive columns and dividing factors in Table 1.
Thus we find [formula].
As before, knowing [formula]2, we find [formula] is proportional to (-1, -2,0,1).
Normalising: [formula] The same principle serves for the third root.
If cr = { a, b, c, 1 } then [formula] and [formula] must both vanish; we find [formula] If we select c3 from Table 1 as our new co, it already excludes x1, so that [formula] will also exclude [formula].
This is [formula] If this is written{ 0.9333,0.0333,  1.0333,1 }, it satisfies (6).
We now iterate on A, but ignore the first two rows: [formula] We leave it to the reader to show that this converges quickly to [formula]3 = 5, x3 = { 1,0, -1,1 }.
In the usual way, [formula] is found to be proportional to{ 27, -11, -15.8 }.
Normalising: [formula] As before, no iteration is needed for the completion of the solution.
If we choose the last column above (which excludes x1 and [formula]) as a new co, then a column which also excludes x3 is [formula] Since x1, [formula], x3 are excluded, this column is proportional to x4.
Premultiplication by a row of A gives [formula]4 = 2.5; completion of the solution is straightforward.
(ii) Second method
Instead of sweeping our iterated columns at each step, we can do it one for all with the aid of a &quot; sweeping matrix &quot;.
In view of (3) we may write [formula] If (8) is used as postmultiplier of A, it is clear that A is deflated by the sweeping matrix in (8) to [formula] (note that A1 here differs from the deflated form A1 of 2.7.2).
If A is of simple form (e.g. sparse) this characteristic may be partly or wholly destroyed in forming A1; on the other hand, A1 has a column of zeros, and thus its own simplicity.
This means also that the umber of the head of a column iteration is always multiplied by zero, an so need not be calculated till the final step.
We are thus effectively operating with the third-order submatrix of A1 when its first row and column are omitted.
As before, we may find a suitable initial column such as (4).
We then perform the iteration, omitting the first element of (4): [formula] The numbers produced in this way reproduced exactly those of the last three elements in the columns of Table 1, as well as the dividing factors; they are of course exactly the same calculations, but with the matrices associated differently.
At the conclusion, the top element of the final column is found by use of the first row of A1.
We must emphasise, however, that the row vectors A1 are not [formula], [formula].
Hence, when [formula]2 has been found, we revert to A itself to find [formula].
For the next root, we use another sweeping matrix based on the second part of (5): [formula] If this square matrix is used to postmultiply A1 we find [formula] We now iterate on the submatrix [formula] to determine the last two elements of x3; we leave the completion of the solution to the reader.,
2.7.4 Shifting and inverse iteration
Theorem IV of 1.22 noted that, in view of the identity [formula] addition of a quantity{ ge } to each of the diagonal elements of A implies the addition of{ ge } to each of the eigenvalues of A. We note here, additionally, that the constituent matrices zii of A are unaltered by such a change.
Sylvester's expansion gives (see 1.18) [formula] whence [formula] The device of adding{ ge } in this way is known as &quot; shifting &quot;, and has a variety of uses.
For example, with hindsight, we know that the eigenvalues of A in (2.6.1.1) are 25, 12.5, 5 and 2.5. the rapidity of convergence of the initial power solution depends principally on the relative separation of the eigenvalues 25 and 12.5, i.e. 2:1.
If we reduce all eigenvalues by 7.5, they become 17.5, 5,  2.5,  5 and the relative separation is 31/2:1 which yields much quicker convergence.
Note that this is the optimum: the second and fourth eigenvalues now have equal moduli.
To select the optimum requires accurate knowledge of the eigenvalues, which at the outset of a calculation is of course not available; but if one has a general idea of the disposition of the eigenvalues, shifting can save much time.
The principle use of shifting, however, lies in inverse iteration.
Suppose, for example, that the matrix A is the system matrix of a vibrating mechanical system, and suppose further that this system is excited by a forcing vibration of frequency corresponding to [formula] = [formula].
It is desired to determine the eigenvalue (and its vectors) nearest to [formula] in order to study the response.
To approach this problem, we first note that Sylvester's expansion for negative powers of a matrix B gives [formula] It follows that the eigenvalue of B with the smallest modulus now becomes the eigenvalue of B-1 with largest modulus.
Hence, to solve our problem, we first evaluate B = A -[formula] the eigenvalue of A nearest to [formula] now becomes the eigenvalue of B nearest zero, i.e. of smallest modulus.
(Note that A, B and B-1 all have the same constituent matrices zii.)
We therefore invert B and iterate on B-1 as usual to obtain our solution.
Example 1
Once more we choose the matrix A defined in (2.6.1.1) and used in previous examples.
We also let [formula] = 5.5.
Then [formula] If we invert this, e.g. by the methods of 2.2, we obtain [formula] We postmultiply this by an arbitrary column co: here we choose co = e1 and iterate as usual.
The successive columns are given in Table 1.
The iterations converge rapidly to x = { 1,0,  1,1 } with the eigenvalue of B-1 equal to -2.
Hence the eigenvalue of B is  0.5 an of A is 5.5  0.5 = 5, in accordance with results obtained earlier for [formula]3 and x3.
Note that, since the eigenvalues of A are in fact 25, 12.5, 5, 2.5, those of B are 19.5, 7,  0.5,  8, and of B-1 therefore 0.0513, 0.1429,  2,  0.124.
The relative separation of the two roots of largest moduli is here 14:1.
Clearly, the method is not restricted to mechanical systems: by its means we can study the eigenvalues and vectors of any matrix A which lie near any given value [formula]o of [formula].
[formula]
2.7.5 Confluent eigenvalues
So far, we have restricted our attention to unrepeated real roots of the characteristic equation belonging to a matrix A; we now consider the case of two equal real roots.
Multiple roots are in practice very rate (except perhaps for mechanical systems moving freely in space, when multiple zero roots can occur: but these have a special simplicity).
The treatment for two roots is readily extended to the multiple case if required: even for two roots, however, we require to examine two cases (see 1.21).
We assume the roots to have dominant moduli, any roots of greater modulus having been removed by deflation etc.
(i) Characteristic matrix doubly degenerate
When a repeated eigenvalue [formula]1 makes the characteristic matrix doubly degenerate, we know (see (1.21.14)) that the Sylvester expansion of the system matrix A is [formula] It follows that we can use the power method to find [formula]1; postmultiplication by co gives in the limit, when r is sufficiently large, say s, [formula] We thus find [formula]1 without difficulty; but the determination of z11 + z22 requires further consideration.
Now [formula] and [formula] is thus, in general a linear combination of 1, [formula].
At this point, the iteration has given no indication that a double root is involved; however, this becomes apparent when we seek the row vector corresponding to cs: for all first minors of [formula]1I  A vanish and we can not find a unique row vector.
To proceed, it is simplest to generalise (2.6.4) by partitioning off the last two rows and columns of [formula]1I  A, thus: [formula] Here [formula][formula] is a square submatrix of order [formula] of order 2; [formula] has two columns and [formula] has two rows.
Then [formula] These algebraic equations are solved for [formula]; we now have two independent columns [formula] which annihilate [formula]1I  A when used as postmultipliers; and correspondingly for the rows [formula].
We now normalise them; we evaluate [formula] Then m is a square matrix of order 2 and, for example, [formula] are normalised rows and columns.
Finally [formula] Note that only the sum [formula] is unique.
If they are individually expressed as [formula] and [formula] then their sum can be written [formula] where M is an arbitrary non-singular matrix of order 2.
This replaces x1, [formula] with two different linear combinations (x1, [formula]) M of them, with corresponding changes for [formula].
It will be observed that in this approach we have not used cs (which we have already found); its introduction would spoil the simplicity of (2).
When x1, [formula] have been found, it is possible to check that cs is a linear combination of them.,
Example 1
Consider the matrix [formula] It is assumed that at this stage we have no knowledge of its eigenvalues.
If we choose [formula] and iterate in the usual way, we find quite quickly that [formula] Normally, we would choose [formula] However, if we attempt to use (2.6.4) to find [formula], it at once emerges that all first minors of [formula]1I- A vanish, indicating that [formula]1 = 12.5 is a repeated root.
We therefore apply (2) with [formula] This gives for the solution of (2) [formula] Since the columns and rows are arbitrary to a scalar multiplier, we may now write [formula] (for simplicity of exposition we have multiplied the first column on the right by 3).
The product of the numerical matrices in (8), in that order, is (see (3)) [formula] If we premultiply [formula] we ca finally write [formula] These enable us to choose, if they are required, [formula] but, following (5), we could equally well use any two different linear combinations o x1, [formula], represented by (x1, [formula]) M provided these are associated with [formula]., /div4.
(ii) Characteristic matrix simply degenerate
This case is rather more difficult: the difficulty derives from the appearance of a unit in the superdiagonal of the spectral matrix of the given matrix A. For simplicity of exposition we assume here that A is of order 4: extension to larger matrices is obvious.
We know (see 1.21) that [formula], [formula] so that [formula] on expansion of the triple product.
We know further that, in view of the properties of the constituent matrices zij, [formula] When [formula] is absent, this reverts to the simpler case dealt with under (i).
Our task now is, given A, to resolve it numerically into one of the forms (12), i.e. to find [formula]1 etc.
Sine, a priori, we do not know the nature of A, we should approach this task as usual by iteration with A on an arbitrary column co.
If we write [formula] then postmultiplication of (13) by co yields [formula] When the iterations have proceeded so far, say when r = s, that contributions from [formula]3, [formula]4 are insignificant and may be neglected, then [formula] Previously, with [formula] absent, at this stage [formula] equalled [formula].
Now, owing to the last term, this is no longer true and successive columns tend to show a near arithmetical progression in their elements.
It is true that, when s is very large [formula] the columns tend ultimately to x1 (last term in cs dominant), but the number of iterations required is very large.
However, in place of the simple [formula] we can drive from (15) the relation [formula] In practice, therefore, if convergence seems very slow, we evaluate three successive columns fully, omitting the reduction of a homologous element to unity, and solve the quadratic [formula] for at least two homologous elements.
If they have a common root, we test the remaining quadratics to see if they are satisfied by this common root.
If they re, then the root is [formula]1 and the iterations have proceeded far enough (r = s) for only terms in [formula]1 to remain.
Knowing [formula]1 we use (15) to evaluate [formula] giving us a column proportional to x1.
Knowing [formula]1, we now form the matrix [formula]1I  A; if we put r = 0,1 in (13), multiply the first by [formula]1 and then subtract the second: [formula] Premultiplication of this by [formula], or postmultiplication by x1, annihilates it; none of the other vectors does so: the matrix is rendered simply degenerate by the presence of z12.
We may, however, use (2.6.4) to find [formula], we then known [formula]1 and [formula] subject to scalar multipliers.
To find [formula] we revert to (11), which yield, inter alia, [formula] or [formula] Since [formula]1I  A is simply degenerate, the last equations provide, in general, n  1 independent linear equations for [formula] (and [formula]).
However, one element in each of these vectors may be assigned arbitrarily.
For, since ([formula]1I  A) x1 vanishes, the equation for [formula] is also satisfied by [formula], where q is arbitrary; so, for example, if we choose a non-zero element in [formula], the homologous element in [formula] may be made zero by appropriate choice of 1.
Similar considerations apply for [formula].
We are thus able to solve (19) for [formula] and [formula].
Having done this, we now require to normalise: we evaluate [formula] and we may then choose our vectors as, for example, [formula], [formula] and the pair [formula] which we now call [formula].
We have now obtained [formula]1 the vectors [formula] and [formula], and the auxiliary vectors [formula] and [formula]; we are therefore in a position to deflate A (see (12)) by removing all contributions from these quantities.
The solution is then completed in the usual way.
Example 2
We require to find the modal and spectral matrices of [formula] We begin by iterating on co.
If we choose co = e4, then after about 18 steps, four successive columns are [formula] These show very slow convergence; indeed, the elements show a near-arithmetical progression.
We therefore choose, say, the first of these columns and without reducing the last element to unity, calculate the next two columns: [formula] We choose, say, the first and last elements of these and solve [formula] The common root 12.5 also satisfies the middle two quadratics; hence we find [formula]1 = 12.5.
We now use (17) and find{ 2.2875, 4.6875, 7.5, 13.125 }  12.5{ 0.165, 0.355, 0.57,1 } = { 0.125, 0.25, 0.375, 0.625 } and hence [formula] may be taken as{ 1,2,3,5 }.
Using (21) we now evaluate [formula] and it may be checked that postmultiplication by [formula] gives a null product.
We use (22) as in (2.6.4) to find [formula], we leave the solution to the reader: it is found that [formula] which when used to premultiply (22), annihilates it.
We now proceed to find the auxiliary vectors.
If we write [formula] (the first element of [formula] is nonzero, so that of [formula] may be taken as zero) then we solve, using the appropriate submatrices (see(19)) [formula] The solution is [formula].
A similar calculation yields [formula].
We must now normalise these.
We evaluate [formula] Note that here, since the second row and first column of the product are necessarily orthogonal, m is triangular; also in this case it happens that m-1 = m.
Accordingly, we may choose to retain the columns [formula] and to adopt the rows, orthogonal to [formula], [formula], [formula] We are now able to evaluate [formula] and [formula] We can now deflate A (see (21)) to [formula] Note that trA1 = 7.5; also A1 is doubly degenerate, being annihilated by postmultiplication by both [formula] and [formula].
Iteration with A1 on co = e4 produces quite quickly the simple eigenvalue [formula]3 = 5 and vector x3 = { 1,0, -1,1 }.
We then evaluate 5I  A1 and employ (2.6.4) to obtain [formula].
Normalising this with x3, we obtain [formula] and we are now able to deflate A further to [formula] Since [formula] and [formula] is the only remaining eigenvalue, [formula] and [formula] can be written [formula] which completes the solution.
The results may be expressed compendiously as [formula] The numbers are not of course unique.
We could, for example, divide x4 by any factor f provided we multiply [formula] by f.
However, if for example we divide [formula] by 10 and multiply [formula] by 10, we must replace the unit in the superdiagonal of [formula] by 10.
2.7.6 Conjugate complex eigenvalues and vectors of a real matrix
So far, we have restricted our studies to real eigenvalues.
However, complex eigenvalues and vectors are of frequent occurrence, and require special consideration.
First, if the matrix is real, then its characteristic equation must have real coefficients.
Hence, if it is satisfied by [formula], then it will also be satisfied by [formula]; complex eigenvalues of a real matrix therefore occur in conjugate pairs.
In a similar way, if the eigenvector corresponding to [formula] is [formula], then the relation [formula] implies the conjugate relation; hence p  iq is the eigenvector corresponding to the eigenvalue [formula]
Evidently, the vector in (1) is arbitrary to a scalar complex multiplier, say [formula] + [formula] another vector satisfying (1) is thus [formula] Any arbitrary linear combination of p and q may therefore be taken as the real part of the vector, with a corresponding combination for the imaginary part.
If we expand (1) we obtain successively [formula] If we eliminate q we find [formula] The same equation is satisfied by q.
This suggests that the square matrix in it is doubly degenerate and that p, q can each have two arbitrary elements; but this is not so.
When [formula] p are known, q is uniquely related to p by [formula] in which [formula] is not singular.
What is said above gives a background to what follows.
We assume once more than [formula] have dominant moduli, any eigenvalues of greater moduli having been removed by deflation etc.
Again, a priori, we assume nothing is known of the nature of A, so that the process of finding the dominant eigenvalue and its vector is approached as usual by iterating with A on an arbitrary column co.
Once again, for simplicity, we assume A to be of order 4.
The Sylvester expansion of A gives [formula] Accordingly, if we perform the usual iteration on co and continue until, with r = s, terms in [formula]3 and [formula]4 are negligible.
[formula] where [formula] is a complex scalar.
Now, as we have seen (p + iq) is arbitrary to a complex scalar multiplier (and p  iq similarly).
We are therefore at liberty to choose [formula] and then [formula] Elimination of p, 1 from these three equations yields [formula] Note that, since [formula] this is precisely the same as Equation (3); however, in the iteration case, it is only true when the iterations have proceeded sufficiently far for the terms in [formula]3, [formula]4 to disappear, when cs consists only of p.
Now in practice, with a pair of complex eigenvalues, the elements in [formula] tend to change sign periodically: there is no point in reducing a homologous element to unity, since there is no convergence to a settled form.
When [formula] behave in this way, conjugate complex eigenvalues must be suspected; and then three consecutive columns must be periodically tested to see whether various pairs of homologous elements give consistent values for [formula].
When this happens, the iteration has gone far enough, and we can take p = cs, when q is determined from the second of equations (6), which is in fact the same as (4).
We now know{ gm }, { go } p and q.
Finding [formula] is not so straightforward.
Knowing [formula] we can either(i)Solve [formula] with one element of [formula] arbitrarily assigned.
We then have to solve a set of linear algebraic equations with complex coefficients; (ii) Iterate with A postmultiplying an arbitrary row [formula]; this will lead to [formula] (and [formula] again) as for p, q. (iii) Solve the equations which parallel (3) and(4): [formula] Since [formula] are known, this, like (ii), involves only real numbers; on the other hand, it requires [formula], the evaluation of which requires considerable computer time if the order of A is large.
On the whole, since most modern computers can work with complex numbers, (i) seems the most straightforward.
When we have obtained [formula] we can perform a check, which is in any event needed when the vectors are normalised.
For since [formula] belong to different eigenvalues, they are orthogonal, and [formula] so that [formula] For the process of normalisation, we require to evaluate [formula] and m is evidently diagonal; in view of the check equations (9) it can be written [formula] the determinant of which is [formula], so its reciprocal is readily written down and we can obtain normalised forms of [formula].
We can now deflate A. Since [formula] a deflated matrix A1 is [formula] We can therefore use A1 to obtain the remaining eigenvalues and vectors of A in the usual way.
Example 1
Let [formula] If we use this for iterative premultiplication of an arbitrary column, say the summing vector, successive columns are as show in Table 1.
[formula] It is apparent from the first that the vector elements are changing sign; so we do not reduce a homologous element to unity (though in general we might periodically remove a power of 10).
If, starting say at c5, we test to see if pairs of homologous elements give consistent values for [formula], [formula] in (7), we find this is not so.
However, [formula] give nearly consistent values; we therefore begin with [formula] accurately: [formula] If these three columns are used in (7) then any two pairs of homologous elements give [formula].
We thus obtain [formula] Also, we may now choose [formula] and using the second of Equations (6) we now find [formula] We now have to find [formula], and for this we use (8).
One element of [formula] can be arbitrarily assigned: here we assign the value 1 to the third element and solve [formula] which, if we omit one column (e.g. the last) from the square matrix, gives [formula] We omit the details of solution; the answer is [formula] Note that [formula] Accordingly, [formula] and if we postmultiply p + iq as given by (14) and (15) my m-1 we obtain as our normalised p, q, [formula] We are now able to deflate A. Equation (11) becomes, with A given by (12) and use of (13), (14), (15) and (16), [formula] A1 is, in fact, doubly degenerate; it is annihilated by both the columns immediately above it.
iteration with A1 on an arbitrary column quickly leads to the result [formula] Further delation leads at once to [formula] which completes the solution.
2.7.7 Adjacent eigenvalues: Two eigenvalues with nearly-equal moduli
When two eigenvalues have nearly-equal moduli, it is obvious that very many iterations will be required for a direct solution.
The following device can be helpful in this case.
Suppose the iterations have proceeded so far that only contributions from the two nearly-equal eigenvalues remain.
Then we can write, when a homologous element is not made unity, [formula] Here a, b, c, d; p, q are the values of a homologous element in the various columns; a, b, c, d are known.
Elimination of p, q, yields two equations which may be solved for [formula]1 + [formula]2 and [formula]1[formula]2: [formula] Hence, [formula], [formula] are readily found.
When they are known, then [formula] which follow from the first two of Equations (1).
When [formula]1, [formula]2 are very close, both numerator and denominator in (2) tend to be small differences, so that accurate calculation of a, b, c, d is desirable.
In practice, iteration in this case behaves rather like that for a defective matrix (2.7.5), in that convergence is slow and successive columns tend to a quasi-arithmetic progression of the elements.
The occurrence of nearly-equal roots is more common than that of a defective matrix; hence tests to see whether homologous elements yield, in (2), consistent results should be applied first when convergence is slow.
[formula]
Example 1
[formula] Iteration on a column, beginning with co = e4, yields successive columns as given in Table 1.
At this stage, we make consecutive accurate calculations, to see if we can get consistent values for [formula]: [formula] Using the elements of the last line in (2) we obtain [formula] The elements of the first line give [formula] The results are consistent, as indeed are the results from the second and third lines.
By inspection (or from the implied quadratic) [formula] We now evaluate the eigenvectors: [formula] so that we may take [formula] I now remains to find the corresponding row vectors and to deflate A for further study; we leave this to the reader.
However, we may note here that [formula] whence [formula].
2.7.8 Applicability of power methods
We have seen that the power method can be used to obtain dominant eigenvalues and the associated vectors, and, by deflation, swee[formula]ng, shifting and inverse iteration, can also be used to find all non-dominant eigenvalues and their vectors; the results are obtained seriatim.
This implies that, if one is interested only in a limited number of eigenvalues, the power method is the obvious choice.
For example, suppose we have a mathematical model with n = 100 for study of the vibration characteristics of an aircraft.
The aeroelastic engineer will usually be interested only in the fundamental mode of vibration and a few overtones, say 10 in all; in the usual model this implies the dominant and the 9 immediately subdominant eigenvalues and vectors.
The vibration engineer will be concerned with a band of frequencies near the engine rotational speed, and so will use shifting and inverse iteration.
But in any event, the mathematical model will usually involve the finite element concept which gives accurate values for the lower band of frequencies, but is often quite unrepresentative of the top frequencies.
Evaluation of all frequencies and modes is therefore not normally required.
For such a model, power methods are the obvious choice.
However, if interest attaches to all eigenvalues, and in particular if there is less concern with the associated vectors, other methods may be used.
We now begin a brief study of transformation methods.
2.8 TRANSFORMATION METHODS
In these methods, we apply a succession of similar transformations to a matrix A: [formula] and so on, until A is transformed into a form giving the eigenvalues directly: it may be a triangular form, or the ultimate canonical form [formula] (or [formula] if A is defective).
Since the transformations are similar, all the derived matrices B, C, D,... have the same eigenvalues as A; and if we proceed to the diagonal form, then the chain product [formula][formula]X3... gives the modal matrix X.
2.8.1 Jacobi's method for a real symmetric matrix
In 1.22, Theorems VIII and X, we have shown that a real symmetric matrix has real eigenvalues and that it can not be defective.
Moreover, its modal matrix is orthogonal; hence [formula] Jacobi's method is as follows.
We begin by searching the off-diagonal elements of A (since A is symmetrical, we usually use the upper half only) to find the element Auv of greatest modulus.
We then construct the orthogonal matrix [formula], which is the unit matrix except that the units in the (u, u) and (v, v) positions are each replaced by [formula] and the zeros in the (u, v) and (v, u) positions by [formula] and s respectively.
Then [formula] and we may evaluate [formula] of which the relevant submatrix is (omitting all other rows and columns) [formula] which yields [formula] We now choose{ gth } 1 such that Buv vanishes; i.e. [formula] This fixed{ gth } 1.
If than 2{gth}1 is positive, we take 2{gth}1 to lie in the first quadrant; if negative, in the fourth, Then{ gth } 1 lies between [formula] = c is positive and [formula] has the sign of tan 2{gth}1.
With c, s fixed, we can now evaluate B fully.
Elements other than in the uth, vth columns and rows are unaltered from those in A; at the intersections of the uth, vth columns an rows the new elements are given by (3) (with Buv = 0), while the remaining elements Biu, Biv in the uth, vth columns and rows are [formula] We may make two deductions.
First, from (3) we have [formula] and since only these diagonal elements have changed, it follows that [formula] as it must in a similar transformation.
Next, fro(5), squaring and adding, we have [formula] and since only the uth, vth columns and rows have changed, we deduce that the sum of the squares of all the off-diagonal elements of B is less than that of A since we have replaced the two elements Auv of A by zeros in B. Thus [formula] Having evaluated B, we repeat the procedure to find C; we select the element Bpq of largest modulus and find [formula] from (see 4)) [formula] we then proceed as before to construct [formula] and [formula].
C will have Cpq = 0 and may also have cuv = 0 (if [formula]); but this situation will not persist.
At some stage, in general, elements formerly made zero will again become nonzero, though usually smaller than before.
Thus Jacobi's method does not terminate in a finite, predictable number of steps; indeed, the umber may be very large.
Nevertheless, following (6) [formula] and, in an overall sense, the off-diagonal elements therefore become progressively smaller, until eventually we obtain a matrix in which the off-diagonal elements are all vanishingly small: i.e. a diagonal matrix of which the diagonal elements are the eigenvalues.
We give below an example, but must enter the caveat that because of its small order (n = 3) it converges ra[formula]dly.
At each stage, the sum of the squares of the off-diagonal elements decreases by at least one third.
However, if we had a matrix of order 100, it would have 9900 off-diagonal elements, and we can only say that at each stage the sum of the squares of these elements would decrease by 1/4950 at least; convergence to diagonal form is likely to take very much longer than for a small matrix.
We may add two observations.
The method is viable if Auv is reduced, not to zero, but to some lesser modulus; nevertheless, reduction to zero is obviously best.
Also, the method requires only that A is real and symmetric: its state of definiteness is irrelevant.
In the example below, A has a negative eigenvalue.
Example 1
Let [formula] We select the off-diagonal element of largest modulus (underlined) and evaluate (we record only six decimal places) [formula] Hence [formula] and then [formula] Note that [formula] also [formula] thus we have effected a considerable overall reduction.
The second step is to obtain C. For this we select the underlined element in B an evaluate in succession [formula] and then [formula] Note that the zero elements in B have become nonzero in C; however, [formula] is now 2.19132; compare [formula] above.
Proceeding in this way, after six transformations, we find to the order of accuracy employed [formula]
2.8.2 The LR and QR methods
We now describe briefly two methods which are applicable to any square matrix A, whether symmetric or not, real or complex, simple or defective, non-singular or singular.
Here, we shall merely state and illustrate the methods; for proofs, readers are referred to Wilkinson (6).
We may note, however, that while a complex matrix A = B + iC (B, C real) may be treated as such, using complex arithmetic, it is more usual to treat its real equivalent [formula] which (see Theorem XI, 1.22) yields the eigenvalues and vectors of A and its conjugate.
2.8.3 The LR algorithm
It is a straightforward process, readily programmed for a computer, to resolve a matrix A into the product L1R1 where L1 is a lower triangular matrix having units in its diagonal, and R1 is an upper triangular matrix.
For a model, we here take n = 4; it is ty[formula]cal of any order.
Let [formula] [formula] Apart from the top row of [formula] which by inspection is identical with that of A, we have here 12 equations ad 12 unknowns.
From the first column of the product we have [formula] which determine p, q, r.
The second column yields [formula] which, with p,.
q, r known, determine in succession [formula]2, s, t.
The remaining unknowns [formula]3, x33, n and the last column of R1 are found progressively in the same way.
Note, however, that A11, [formula]2, x33,... are divisors ([formula]vots) in this procedure, so that a [formula]voting strategy may be needed.
Note also that [formula], so [formula]; hence, if A is non-singular, the diagonal elements of R1 must all be nonzero.
Indeed, this program is very suitable for the numerical evaluation of determinants.
The [formula] algorithm resolves A into the product [formula] as above, and then multiplies the factors in reverse order to obtain a new matrix B. Since [formula] and then [formula] is a similar transform of A. B is, in general, a fully populated matrix of which the last column is that of R1 (thus A14 reappears in B) and the last row that of L1 multiplied by x44.
When B is found, it is in turn resolved into the product L2R2 and [formula] evaluated, and so on.
As the succession of similar transforms proceeds, it is found that (a) the elements below the diagonal become progressively smaller, (b) the diagonal elements tend to the eigenvalues, in descending order or modulus down the diagonal.
When the process is well established, it is found that (with n = 4 as our model) the element [formula] becomes approximately [formula] in [formula].
When this element becomes very small [formula] tends to decrease by the factor [formula] and [formula] by [formula] and so on.
Accordingly, if the eigenvalues of A are widely spaced, convergence of the [formula] method is fairly ra[formula]d; if, however, there is a pair or group of nearly-equal eigenvalues, then very many iterations may be required before the end result is achieved.
This is an upper triangular matrix T, of which the diagonal elements are the(real) eigenvalues in descending order of modulus, and which still retains the element A14 in the top right position.
If A has a pair of conjugate complex eigenvalues, then T is not strictly upper triangular: the diagonal includes a block of order 2 such as [formula] with [formula] on the diagonal of T and [formula] below it.
In this case, the solution of [formula] yields the complex eigenvalues.
the numbers, [formula] not in general constant under transformation; but if the iterations have proceeded so far that I is in the quasi-triangular form, the trace and determinant of (3) are invariant, so that the complex roots are fixed.
As the LR transformations proceed, the most ra[formula]d convergence is to the smallest eigenvalue in the bottom right-hand corner.
Indeed, if there is a zero eigenvalue, it appears in the first transformation, i.e. in B. It follows that shifting (see 2.7.4) can be used with great advantage to accelerate convergence.
When the element in the bottom right-hand corner has reached a settled value (i.e. the eigenvalue), we can deflate the transform, by omission of its last row and column, before continuing the iterations with a matrix of reduced order., /div4.
Example 1
We choose the deflated matrix i (2.7.2.2) as A: [formula] Since [formula], the bottom right-hand element of [formula] vanishes, as it should.
We now evaluate the product in reverse order: [formula] The reader is invited to check that, if B is resolved as it stands into [formula] then C = [formula] will also have its last row null, the zero eigenvalue thus repeating.
We may therefore deflate B, omitting its last row and column and continue the iterations with the leading first minor of B. It is found that, after 22 more iterations, working to six decimal places, [formula] Thus, the eigenvalues of A are, in descending order of modulus, 12.5, 5.0, 2.5, 0.
The whole operation may in fact be summarised as LT = AL: [formula] Recovery of the eigenvectors in the LR procedure is not straightforward, especially if deflation has been used.
Since the eigenvalues are known, probably the simplest method is to use (2.6.4).
However, if we have not deflated, then clearly [formula] leads to (see (5) above) [formula] We now require to transform T to [formula].
If TP = PA, then ty[formula]cally [formula] where PA has been multiplied out.
Identification of the elements on each side, beginning with the superdiagonal, leads to [formula] giving a, d, f.
The remaining unknowns, b, e, c are found progressively.
With P determined in this way, we have [formula] so that the modal matrix of A is the product LP.
The reader is invited to evaluate LP, using the numerical values of (5) in (6), and in particular to note that the eigenvector belonging to the zero root is that originally associated with the eigenvalues 25 in (2.7.2.6)
The LR procedure is thus simple and easily programmed.
However, in practice, it normally requires very many iterations before convergence is achieved; moreover, the process of recovering eigenvectors outlined above is apt to be ill-conditioned, and some other procedure is usually to be preferred.
2.8.4 Pre-reduction to upper Hessenberg form
As we have seen, the LR procedure (and the QR method, to be discussed shortly) annihilates the elements of A below the diagonal.
This is, however, usually a very long process; and while it can be accelerated by shifting and deflation, it is in fact better to depopulate A below the diagonal, as far as possible, as an initial step.
An upper Hessenberg matrix has only zeros below its infradiagonal; and it is possible to transform any matrix A to this form by a similar transformation.
This is the best that can be done to depopulate a matrix below its diagonal; if one could remove infradiagonal elements as well, there would be no problem in finding eigenvalues and vectors of a matrix.
We illustrate reduction to upper Hessenberg form with a general matrix of order 4; it is clear that the procedure applies to matrices of any order greater than 2.
We write AK = KH as [formula] Just as we have done earlier, we can evaluate the n2 unknowns progressively.
The first column of the product gives [formula] which determine h11, h21, p, q (note that here, also, a [formula]voting strategy may be needed).
From the second column [formula] which yield in succession [formula] and so on.
Thus, by progressive substitution, we arrive at the Hessenberg matrix [formula]; it possesses the same eigenvalues as A. If we apply the LR or QR algorithms to H, the Hessenberg form is retained at each step; thus, instead of removing all the elements below the diagonal, we have to remove only those in the infradiagonal.
Much computer arithmetic and time is thus saved.
Transformation of A to H therefore greatly improves the LR algorithm; we have introduced it here, however, because for the QR method it is a sine qua non.
We may add here, in passing, that it is possible by simple substitutions to reduce any matrix A to tridiagonal form  populated only in the superdiagonal, diagonal and infradiagonal.
However, for our present purposes this is unnecessary; and in any event it often involves ill-conditioned equations and so lacks accuracy.
Example 1
We again choose the matrix A of (2.8.3.4).
The reader is invited to check that, if A is transformed as in (1), then [formula] If we now treat H as our basic starting matrix A and resolve it into L1R1 then [formula] As will be seen from [formula], resolution of an upper Hessenberg matrix requires an upper Hessenberg L1, i.e. a matrix with zero elements everywhere except in the diagonal and infradiagonal.
Both the resolution of A into L1R1 and the subsequent evaluation of [formula] (B is also of upper Hessenberg form) are thus much simpler than when A is fully populated.
2.8.5 The QR algorithm
We assume that the matrix A is already in upper Hessenberg form.
Then, in parallel with the LR algorithm, we write [formula] where [formula], as before, is upper triangular.
However, here we do not resolve A into [formula] instead, we require that [formula] shall be a orthogonal matrix: [formula] Thus [formula] is our similar transformation of A into B. We then transform B into C: [formula] and so on, until the transform is eventually an upper triangular matrix having the eigenvalues of A in its diagonal.
It remains to explain how Qi is chosen.
There are various possibilities; the most popular methods are those of Givens and Householder (6).
Here we describe Givens' method; and as before, for simplicity of exposition, we choose n = 4.
Let the matrix, pre-reduced to upper Hessenberg form, be [formula] Also, let [formula] where [formula].
Then if we evaluate [formula] as a chain, beginning by premultiplying A by the last of the three matrices in (4), and if also we take [formula] then A is first replaced by a matrix in which the leading element of the infradiagonal vanishes.
The top two rows of A are altered; in particular, [formula]2 is replaced by [formula].
In the next multiplication we take [formula] and then the second element in the infradiagonal vanishes.
Finally, with A33 now altered to [formula] we put [formula] and complete the chain.
The result is the upper triangular matrix R1.
We now have to complete the similar transformation by postmultiplying by Q1; the result, [formula] is again of upper Hessenberg form.
We now repeat the cycle to obtain [formula] and so on.
As before, it is found that the infradiagonal elements become smaller at each step until the similar transform of A is ultimately upper triangular (or quasi-triangular if A has complex eigenvalues); i.e. a matrix T given by [formula] The following point is important.
In our treatment above, B is the product of a chain of seven matrices.
When the multiplications are carried out in the order indicated, the upper Hessenberg form is maintained throughout, If, however, we begin by multiplying the central three matrices, the upper Hessenberg form is lost, with a accompanying loss of simplicity.
It is now apparent why A must be pre-reduced to upper Hessenberg form.
Equation (4) employs one matrix for each nonzero element below the diagonal  three in this case; n  1 in general.
There are thus 2n  2 multiplications in each iteration.
But if A were fully populated, the number of multiplications would be n (n  1).
Example 1
The reader is invited to work through this example; we avoid repetitive labour by the use of hindsight.
Consider the matrix defined in (2.7.6.12) and reduce it, as in (2.8.4.1), to upper Hessenberg form.
The result is a new matrix [formula] We now, with hindsight, use a shift of 0.2; i.e. we work with the matrix A  0.2I.
Shifts are always used to improve convergence: this particular shift requires only one step.
The matrix is now [formula] and after the six multiplications implied (see (4)) in [formula] it becomes [formula] Thus A  0.2I has a zero eigenvalue; we recover B by adding 0.2I and can then deflate and consider the reduced matrix [formula] Once again, to avoid repetitive labour in the example, we use hindsight: we now apply a shift of 0.4 and consider [formula] We require only one step (four multiplications) to reduce this to [formula] We now recover [formula] by adding 0.4I and deflate by omission of the last row and column: [formula] In any similar transform of [formula] the trace and determinant are unaltered.
The two eigenvalues of [formula] are thus determined by [formula] The required eigenvalues are thus 0.2 (bottom right-hand element of B), 0.4 (bottom right-hand element of [formula]) and [formula].
The convergence properties of the QR algorithm are superior to those of LR, particularly when shifts are used and deflation is performed whenever possible.
If complex eigenvalues exist, &quot; double shifts' (see Wilkinson (6)) may be made and two QR steps performed simultaneously to give rapid convergence while all arithmetic is kept real.
The QR method is thus very powerful and of quite general applicability; it is probably the most popular technique on modern main-frame computers for the evaluation of eigenvalues of non-symmetric matrices.
On the other hand, recovery of eigenvectors, though possible, is very difficult and often inaccurate, and some quite different routine (e.g. (2.6.4), when the eigenvalues have been found by QR) is recommended.,
2.9 MATRIX PENCILS
Our discussion of this to[formula]c is very limited, though we look again at the subject, also briefly, in 2.10.
2.9.1 Eigenvalues and vectors of matrix pencils
Any lambda-matrix of the form [formula]C  B, which can in general be rectangular, *is described as a matrix pencil (see 1.15).
If C is square, of order n, and non-singular, the pencil is described as regular, since in the polynomial equation obtained by expansion of [formula] the coefficient [formula] which does not vanish, so that there are n eigenvalues [formula]i which satisfy (1).
It is clear that these eigenvalues belong also to the matrix [formula].
For each root [formula]i there will be at least one linear relation between the columns of the pencil, so that we may write [formula] and [formula] is an eigenvector of the pencil.
The pencil is described as simple if there are n independent vectors [formula] so that we may write the set compendiously as [formula] where [formula] is of simple diagonal form, even through it may include multiple roots.
If, on the other hand, one or more auxiliary vectors are required, so that the spectral matrix takes the form [formula], then the pencil is defective.
In particular, a pencil is simple if B, C are real and symmetric and if C is pos. def.
This proposition is proved in Corollary 1 of Theorem X of 1.22.
However, it does not follow that if B, C are real and symmetric, the pencil is simple, or its roots real.
The reader is invited to study the two pencils [formula] Pencil (i) is real and symmetric; it has two equal eigenvalues [formula] = 1 and the spectral matrix is of elementary Jordan block form; it is defective, with only the single eigenvector{ 2,1 }.
Pencil (ii) also has real, symmetric coefficients; but its eigenvalues are [formula], with independent vectors{ 1,1 + 1 } and { 1,1 -i }.
It is thus simple.
In both (i) and (ii), [formula] is negative, and so C is not pos. def. (see 1.22, Theorem X).
If B, C are real and symmetric, and if in addition C is pos. def. then as in Theorem VIII of 1.22, the eigenvalues are real and finite and the eigenvectors real.
We ow restrict our attention to simple pencils, where symmetric or not.
In parallel with (2) we have the equation [formula] and this leads to the counterpart of(3) [formula] Postmultiply this by X and use (3); then [formula] For simplicity, let us now suppose all roots [formula]i to be distinct.
Then [formula] permutes with a diagonal matrix and is therefore itself diagonal: [formula] But (see (3)) is arbitrary to a postmultiplying diagonal matrix; we may thus write [formula] and then from (5) [formula] In the special case where B, C are symmetric, we may evidently identify Y with X.
So much for the elementary properties of pencils.
We turn ow to their numerical treatment.
First, if we have a simple pencil [formula]C  B, we may premultiply it by C-1 to obtain [formula]I  A, A = C-1B; we may then obtain the eigenvalues and vectors of the pencil by applying any of the methods of 2.6C2.8 to A, and often this is the simplest procedure.
However, cases can arise where this is not the best approach.
For example, suppose C to be tridiagonal and B diagonal.
Then C-1 is in general fully populated, and so would be C-1B; the simplicity of the pencil is thus lost in A. The same is true when C, B are sparse.
We now give an example in which C, B are sparse and symmetric.
The reader is invited to study this closely, since it illustrates more than the possible treatment of a pencil.
Example 1
Consider the pencil [formula] here C is tridiagonal and symmetric; B is diagonal.
Also (though this is immaterial for our present purpose) both are centrosymmetric.
The pencil thus has particular simplicity as it stands.
We re required to find the eigenvalues [formula]i which satisfy [formula], and the corresponding vectors.
First, if we attempt this evaluating A = C-1B we find [formula] Thus A is fully populated and is not symmetric (though, as it must, it remains centrosymmetric).
We therefore return to the original form (8).
A possible way of dealing with this is to use the location method (see 2.6.1).
If we write [formula] then [formula] (since P is of order 4); we therefore require to assign three arbitrary values to [formula] and then, by simple triangulation, evaluate [formula] for each of these.
Since the product of the roots is [formula], we may choose [formula] = 2, 1,  1 as reasonable arbitrary values.
Then we find, using a typical direct triangulation [formula] while [formula].
Fortuitously, we have found one of the zeros of[formula].
Disregarding this for the moment, we write [formula] and use the above values; we find [formula] the solution of which is [formula].
Hence [formula] or [formula] To find the zeros of [formula] ([formula]) we may, for example, use the companion matrix (see 1.16), which, in this example, is [formula] the eigenvalues of which are the zeros of (10).
We may find these in the usual way by iteration: we repeatedly premultiply an arbitrary column, say [formula], by M. Successive columns are then given by Table 1.
We observe that, owing to the particular form of M, we need calculate only the top element of each column, in this table; the remaining elements are those of the previous column, one step down.
In fact, we re only solving the regression formula [formula] which is equivalent to the iteration with M; this is in effect Bernoulli's method of solution of [formula] ([formula]) = 0.
To six places of decimals, we have now found [formula]1  6.464102.
Hence [formula] on extraction of the first factor.
If we set up a regression formula (or companion matrix) for the cubic, we find the next factor to be ([formula] + 1), as we have already noted.
In face [formula] so that the eigenvalues are [formula].
We now look briefly at the eigenvectors.
Typically [formula] which is of course singular.
We treat it by the method of Equation (2.6.4), using the first three rows, to obtain with x4 =1, [formula] so that we find at once [formula].
In this way, with the eigenvalues in order of modulus, we find [formula] A point of interest may be noted: the individual vectors are either centrosymmetric or centroskew, just as a symmetric structure has either symmetric or antisymmetric modes of vibration.
We leave it to the reader to check that (see (3)) [formula] and also that [formula] are diagonal.
To complete this discussion, we may observe that the companion matrix, or its equivalent  the regression formula, may be used to find complex roots of a determinantal equation (see 1.7.6).
For example, suppose we find [formula] Then we may set up the regression formula [formula] and if we begin with [formula], we find for successive values of [formula] [formula] The periodic change of sign, with no settled ratio, indicates complex roots.
We therefore apply Equation (2.7.6.7) to the last 4 figures in sequences of 3 (we assume contributions from the other roots to be vanishingly small): [formula] which give us [formula].
Hence (13) has the factor [formula]; and we may write [formula] so that all he factors of (13) are complex.
2.9.2 Deflation of a matrix pencil
The eigenvalues and vectors of a simple pencil (not necessarily symmetric) will satisfy [formula] Write these as [formula] then [formula] are evidently the eigenvectors belonging to the eigenvalue [formula]i of C-1B.
If we normalise them so that [formula] then Sylvester expansion of C-1B (see 1.18) is [formula] so that we may express B in terms of C, etc., as [formula] Now suppose that we have found [formula].
Then we are able to evaluate, on the left-hand side, [formula] and evidently a new pencil, which has all the eigenvalues and vectors of the original pencil except that [formula]1 is replaced by 0, is [formula] This is Lancaster's deflation formula.
We may note two points: (i) since [formula] has a zero root, must be singular, (ii) it is not necessary to normalise the vectors; we can write B1 generally as [formula]
Example 1
We use the pencil defined in (2.9.1.8) and we assume first that we have found an eigenvalue [formula]1 = -1 and the associated eigenvector [formula].
Since the pencil is symmetric we shall have y = x.
Now, using (2.9.1.8), we have [formula] Hence, from (5) we find [formula] which, with B defined as Diag(1,2,2,1) gives [formula] This matrix is simply singular; indeed, as is indicated by (3), it is annihilated by postmultiplication by [formula]: in (3), [formula] is orthogonal to [formula], i = 2,3,..., n.
Accordingly, the pencil P1 ([formula]); has a zero eigenvalue.
The deflation can of course be repeated.
If, using P1 ([formula]),; we now find an eigenvalue [formula]2 = 0.6 and the associated eigenvector [formula], then [formula] and then we may calculate [formula] which, with B1 given by (6), yields [formula] By inspection, B2 is doubly degenerate; it is of course annihilated by postmultiplication by [formula] or [formula].
A third deflation would produce B3, a unit rank matrix, which in this instance (n = 4) would be the last term in (3).
2.9.3 Iteration with submatrices of vectors
We conclude this section with a description of a method often used in the following circumstances: (a) The problem is formulated as a matrix pencil; if it is of dynamical origin, we re given a non-singular symmetric stiffness matrix C of order n and a corresponding symmetric mass matrix B: the problem is to find certain modes and frequencies satisfying [formula], x being proportional to exp[formula] (b) the order n is large, say [formula] (c) Interest attaches only to a relatively small number, say [formula], of consecutive frequencies and modes, beginning with the fundamental.
The process is one of iteration with a submatrix of p vectors, where p is a little greater than the number of eigenvalues sought, and involves the solution at each step of an eigenproblem of order p only.
Since B, C derive from a mechanical system and are non-singular, they will be pos. def., and accordingly (see Theorem VIII of 1.22) the system eigenvalues are all real, finite and positive.
if we write [formula] then the full problem may be written [formula] where X is the modal matrix, [formula] the spectral matrix, and A = C-1B the dynamical matrix.
It is assumed that the eigenvalues in [formula] (and the corresponding eigenvectors in X) are so ordered that [formula] has the eigenvalues [formula] in descending order of magnitude down the diagonal, so that [formula]1 and the first vector in X define the fundamental frequency and mode.
We are not concerned to solve (1) as a whole.
If it is partitioned in the form [formula] where Y is of order (n  p) and [formula] square, of order p, (2) gives [formula] and only the first of these equations concerns us.
However, we may note that (see 1.19) premultiplication by [formula] of the first equation in (1), and transposition, shows that [formula] are both diagonal.
For analytical purposes we may normalise X so that [formula] which imply, inter alia, [formula] Having set out some preliminary considerations, we now state the method.
Select a set of starting vectors Yo, arbitrary except that they must be linearly independent; then evaluate the (n  p) matrix [formula] and use Wo to form the two square matrices of order p [formula] Now solve the eigenproblem (any suitable method from this chapter may be used) [formula] for its modal matrix Mo; the spectral matrix Ao which emerges is (see below) a first approximation to Ay.
Then an improved approximation to Y is [formula] where do is an arbitrary non-singular diagonal matrix which may, for example, be used to make a homologous element in each of the corresponding columns of Y1 and Yo the same; do is useful, but not essential.
This completes one step.
The next step uses Y1 in (6) to give W1 = AY1, and the cycle of operations is repeated until convergence occurs.
We may note some interesting aspects of the method.
(i) If Yo consists of one vector only, so does Wo, and then [formula] are scalars, s is the modal matrix [formula].
If we choose [formula] in (9) then [formula] and this is the single vector iteration discussed in 2.7.1.
Moreover, the trivial eigenproblem gives a first approximation to [formula]1 as the Rayleigh quotient[formula]. (ii) If Yo happens to be the correct set Y, the method makes this apparent at once, for then in view of (3) [formula] so that, on use of (5) [formula] Thus both matrices are diagonal, so that the eigenproblem is again trivial, having the solution [formula].
Then [formula] when scaled appropriately, repeats [formula] showing that both equal Y. (iii) If Yo happens to be a set of linear combinations of the vectors in Y, the correct solution is obtained in one step.
For we may then write [formula] where P is a square matrix of order p, non-singular since the columns of Yo and of Y, separately, are linearly independent.
Then [formula] in view of (3).
Thus, on use of (5), [formula] On premultiplication by [formula], the eigenproblem (8) reduces to [formula] which has the solution [formula] If do is chosen as [formula] The condition envisaged in (iii) is, in general, unlikely, unless for example a neighbour system (see Chapter 3) supplies a good approximation to Y. In most cases, we must write [formula] where R, like Yo, is of order (n  p).
In this general case [formula] in view of (1).
Then the eigenproblem matrices become [formula] where (4) has been used.
We therefore have to solve for [formula] [formula] Knowing Mo, we now use (11) to evaluate [formula] whence in turn we find [formula] Since [formula] is non-singular, we may reduce the eigenproblem based on W1 to [formula] which we solve for the unknown [formula].
A third step leads to [formula] It is clear from (13), (16) and (17) that after r steps we have to solve [formula] If [formula] is written as M, (18) in partitioned form is (see (10)) [formula] or, on expansion, [formula] Now, since all the eigenvalues in [formula] are greater than any of those in [formula], when r is sufficiently large the terms in [formula] become negligible compared with those in [formula], when (19) reduces to [formula] and if s is on-singular, this in turn reduces to [formula] which has the solution [formula].
This shows that convergence is complete.
Two things must be noted here.
First, although when r is large, in an overall sense the terms in [formula] those in [formula], yet the smallest element in [formula] is not necessarily much larger than the largest element in [formula].
As a result calculation of the last two or three eigenvalues in [formula] may not be very accurate.
It is for this reason that we choose p to be a little greater than the number of eigenvalues sought.
Second, reduction of the eigenproblem to (20) requires s to be non-singular.
Now R has p linearly independent columns, and therefore also p linearly independent rows.
Thus s can be non-singular; but it is not necessarily so, and then Y as given by the method may be deficient.
For example, if the top row of s were null, the first vector in X (the fundamental mode) would be absent in the make-up of Yo, and this absence would persist.
There is no certain way of avoiding this difficulty when Yo is chosen arbitrarily; but in practice it does not often arise.
Although the eigenproblem to be solved at each step is vastly less time-consuming and expensive than direct methods applied to the full system, yet it does take much time.
However, after a few steps v, and [formula] usually become heavily diagonal, so that Mr tends towards I; this occurs before the vectors Yr have converged.
In these circumstances, use of the Collar-Jahn method, described in Chapter 3; may greatly expedite the solution.
An example of this is given in 3.9.
Further reference to the method is also made in 8.5.
Example 1
The small-order example which is all we can give here does not of course do justice to the method, but at least shows how it works.
We begin with an illustration of the situation in (iii) above; the reader is invited to check the arithmetic.
Let [formula] Then [formula] Suppose now we are given, or choose, [formula] It then follows that [formula] The eigenvalues of [formula] are thus 10 and 5 and [formula] The reader should check that these are eigenvectors of A and that they correspond to eigenvalues 10 and 5.
We have thus obtained our solution, in this case, in one step.
It will be found that each column of [formula] is in fact, a linear combination of the columns of [formula]
Example 2
We use the same A, B, C, as in Example 1, but select as our starting vector submatrix [formula] In choosing Yo we have selected simple vectors which are clearly linearly independent, and which between them bring all elements in A into operation in the formation of Wo.
We then find [formula] where the product is quoted to four significant figures.
Its eigenvalues are 9.805, 2.411 and, very approximately (great accuracy is not necessary), [formula] which yields, again approximately, [formula] Proceeding in this way, we find at the end of the fifth step [formula] and in the sixth step this gives [formula] These are so nearly diagonal that the eigenproblem is trivial; the eigenvalues are 10, 4.9987.
However, we do not obtain convergence until after the 10th step, which gives eigenvalues 10, 5 and [formula] The second columns should be{ 1,1,0, -1 }.
2.9.4 A variant
We now give a variant of the method of 2.9.3; we do so partly because the variant can save computer time, especially if may iterations are required, but also because it enables us simply to bring out certain features of the method, which lead to possible modifications.
For simplicity, we assume that the eigenvalues sought are all different.
We are given the matrices C and B, both symmetric an pos. def.
We may therefore use Choleski's method (see 2.4.2) to find a lower triangular matrix L such that [formula], and may then evaluate [formula], accordingly the equation [formula] may be written [formula] and S is symmetric and pos. def.
Now, instead of using the three matrices C, B and A = C-1B as in 2.9.3 we work only with S.
Suppose we now proceed as in 2.9.3: we begin with the submatrix Yo and successively evaluate [formula] and we then solve for Mo the eigenproblem [formula] Then a new approximation to Y is [formula] At this point we may ask: what is the purpose of eigenproblem?
The answer appears if we premultiply and postmultiply Equation (5) by appropriate quantities to give since do permutes with [formula], [formula] on use of (6).
Transposition of (7) shows (compare 1.19) that [formula] permutes with [formula] and is therefore diagonal, as is [formula].
The columns of [formula] are thus mutually orthogonal; this is the purpose of evaluating Mo: its use orthogonalises the columns, an our ultimate objective is the orthogonal set Y.
A second step with Yo in (2) replaced by Y1, leads to the eigenproblem (compare (5)) [formula] Again we may pause and observe that (8) gives us [formula] directly; we do not need to find Mo first from (5) and then M1 from (8).
Thus the step (5) is unnecessary.
Then, in its turn, the step (8) is unnecessary; and so on.
Indeed, in theory all that is required is that a sufficient number of direct iterations is made, with the columns orthogonalised as the last step; in practice, however, the numbers would become increasingly ill-conditioned as S is raised to a high power and approximates to a unit rank matrix.
However, it is probably sufficient if the columns are orthogonalised at, say, every fourth iteration.
If we do this, the procedure is as follows.
Begin with Yo and evaluate successively [formula].
At this stage, evaluate [formula] and solve the eigenproblem [formula] for M. Then a new approximation, which has mutually orthogonal columns, is [formula], with which in place of Yo the cycle is repeated, and so on.
Example 1
We use the same matrices B and C as in Example 2.9.3.1. then, if [formula], [formula], we find, to four decimal places, [formula] [formula] As our starting matrix we choose [formula] This has mutually orthogonal columns; however, as we shall see later, it is not a good initial choice for the first column.
We ow form [formula], at each stage making the &quot; 11 &quot; and &quot; 22 &quot; elements unity.
We find [formula] we require [formula] as it stands.
We now form v, [formula] as in (9), (10), and solve (11) for M. To sufficient accuracy, we find [formula] and then a new Yo is, with d chosen appropriately, [formula] and it may be checked that these two columns are, sufficiently nearly, orthogonal.
We now repeat the cycle.
This time we find [formula] and [formula] and again, sufficiently nearly, the columns are orthogonal.
A third cycle leads to the eigenvectors (and eigenvalues) of S: [formula] We leave it to the reader to establish that the eigenvectors of the original system, given by premultiplication of w by [formula] may be written [formula]
2.10 IMPROVEMENT OF THE ACCURACY OF EIGENVALUES AND VECTORS
It frequently happens, both in dynamical problems and in other eigenvalue problems that we have approximate values for one or more eigenvalues (and perhaps vectors) and require to obtain more exact values.
This problem, from another viewpoint, is discussed more fully in Chapter 3.
Here we shall deal only with some simple cases.
2.10.1 Problem formulated as a matrix pencil
For simplicity, let us first discuss the undamped oscillations of a mechanical system having n degrees of freedom, the equations of motion of which have been derived by Lagrangian methods from energy considerations; they will appear, in the usual notation of dynamics, as [formula] Here x is the column of coordinates; in what follows, we assume the last element x to be non-nodal (if it is not, it can be made so by rewriting the equations in a different order).
Also C, the inertia matrix, will be symmetric and pos. def. and B, the stiffness matrix, also symmetric and non-neg. def; finally, F is a column of applied forcing functions.
In simple harmonic motion, these forces, and x, will be proportional to exp [formula] and if we then write [formula] for{ go } 2, (1) reduces to the usual formula [formula].
We observe that, in addition to the given matrices C and B, (2) contains 2n + 1 other quantities, viz. , x, and F. Since we have n equations, we can determine any n of these quantities in terms of the remaining n + 1, to which we can ascribe arbitrary values.
Here, we shall
(a) choose  as an independent variable;
(b) prescribe the value unity for [formula];
(c) prescribe the value zero for each of the first n - 1 functions in F, the last being written f.
Then f and the coordinates [formula] become functions of , and we note that when f vanishes (i.e. all the functions F vanish in (2)),  will be an eigenvalue and [formula] the corresponding eigenvector.
We not write, in partitioned form, [formula], [formula], and, since the pencil in (2) is symmetric, [formula], where [formula] is a square submatrix of K of order n = 1.
Then (2) becomes [formula], which yields, progressively, [formula], [formula].
Since the submatrices in (5) are all functions of , (7) and (8) give f and  (or x) as functions of .
Now suppose we have an approximation [formula] to an eigenvalue.
Since f is a function of  which vanishes at the eigenvalues, we may use it in the Newton-Raphson method to obtain an improved approximation [formula]: [formula], the zero suffixes indicating that the functions are calculated for [formula].
Equation (8) is not suitable for differentiation; instead, we obtain another form for f by premultiplying (6) by [formula]; we obtain [formula], and since K is symmetric it follows that [formula], which, with K given by (5), reduces to [formula].
In a numerical calculation, we use (11) for [formula] and either (8) or (10) for f.
If we use the latter, we must note that, at [formula], [formula], when Equation (9) reduces to [formula], i.e. the well-known Rayleigh quotient.
In the numerical evaluation of this, we obtain x from (7) with [formula].
Having obtained [formula] we can adopt it as a new [formula] and repeat the cycle of approximation, which is known to be quadratically convergent: i.e. if [formula] is the true eigenvalue, then for small differences [formula].
Equation (9), with [formula] given by (8), is analytically precisely equivalent to (12).
In numerical applications, they may differ, depending on the accuracy with which [formula] and  are evaluated.
Probably (12) is the user to use.
Example 1
In Equation (1), let [formula].
Suppose we are given an approximate eigenvalue [formula].
We evaluate [formula], which we have partitioned as in (5).
Then  is given by (7) as [formula], and the reader may check that this yields [formula].
We give this to four decimal places only: it is not necessary to calculate  to great accuracy, but once found, it must be used consistently, and calculations made accurately.
We deduce that for [formula], the approximate eigenvector is [formula], and we are able to calculate [formula], [formula], whence [formula].
If we round this off to a new [formula], we quickly find that [formula] also, i.e. 40 is an exact eigenvalue; also the corresponding eigenvector becomes [formula].
If, instead of the Rayleigh quotient, we use (9) directly, we require from (8) [formula].
Hence (see also (16)) [formula], in close agreement with (16).
Problem formulated in terms of a single matrix
If we premultiply (2.10.1.2) by [formula] and write [formula], we obtain the problem in terms of the characteristic matrix of A: [formula].
Note that here A is, in general, not symmetric; however, it evidently possesses the same eigenvalues as the system (2.10.1.2).
Let us now abandon the connection with a mechanical system, and treat this as a problem involving any given real matrix A, the eigenvalues and vectors for which have to be found.
The treatment is closely similar to that of 2.10.1.
As before, we choose  as an independent variable, assign the value unity to [formula] and zeros to the first n - 1 elements of G: [formula]; we also write, remembering that A is not symmetric, [formula].
Then Equation (2) gives us [formula], from which we obtain as before [formula], [formula].
We now look for the left vector [formula] corresponding to x as the right vector; in 2.10.1 this was [formula]: here it is different.
In place of (2) we write [formula]; by the same argument as before, we set [formula], [formula], hence from (4), (9) and (10) [formula], yielding [formula], [formula], and on comparison of (12) with (7) we see that h and g are identical; this is not surprising, since they are evidently similar functions and have the same zeros, i.e. the eigenvalues.
As before, we now employ g in a Newton-Raphson application to improve the accuracy of any given approximate eigenvalue [formula].
Premultiplication of (5) by [formula] gives [formula], and then, differentiating term by term, we have [formula], which in view of (4) reduces to [formula].
Accordingly, as before [formula].
This is the parallel to (2.10.1.12); as before, we can, if we wish, use g as given by (7) for alternative treatment, but we shall not pursue this here.
Example 1
In (2), let [formula].
This matrix is, in fact, obtained from our previous example by evaluating [formula].
However, we shall treat a different eigenvalue: suppose we are told that there is an eigenvalue in the neighbourhood of [formula].
We evaluate and partition [formula].
Then we find at once from (6) and (11) that [formula], [formula], and hence, from (15), [formula].
If we round this off to a new [formula], and repeat the calculation  we leave it to the reader to check this  we find [formula]; finally, rounding this off to 20 and repeating again, we find that 20 is an exact eigenvalue and that the corresponding right and left eigenvectors are, respectively, [formula], [formula].
2.10.3 Factorisation of A into two symmetric matrices
In Equation (2.10.2.1) we derived A as the product of two given symmetric matrices, [formula] and B; this was done for convenience in the examples.
For completeness, we interpolate here a brief discussion of the reverse procedure: it was shown in 1.19 and Theorem V of 1.22 that a matrix can always be factorised into two symmetric matrices, but the subject was not pursued there.
Here we obtain the most general solution.
We first continue the discussion of 1.19, and treat the most common case in which A has distinct eigenvalues.
It was shown there that A permutes with [formula], which is therefore diagonal.
This diagonal matrix, D, is arbitrary, since X is arbitrary to a postmultiplying diagonal matrix.
Write[formula] where P, Q, R,... are arbitrary.
Now if (see 1.18) we write [formula], C may now be expressed as [formula], so that C is expressible as an arbitrary linear sum of the n symmetric unit rank matrices [formula].
Next consider the case where A has two equal eigenvalues , but is not defective.
Then A can be written with the (2  2) scalar submatrix I in the leading diagonal position.
This means that D is not necessarily diagonal; corresponding to I there may be a (2  2) block which is arbitrary except that it must be symmetric and non-singular.
This therefore contains three arbitrary quantities, giving n + 1 in all.
Finally, suppose A has two equal eigenvalues  but is defective.
As previously, we note that [formula] since [formula] is symmetric.
Accordingly [formula] with K as defined in Theorem V of 1.22.
Thus [formula] permutes with [formula].
This implies that [formula] is diagonal except for the leading (2  2) submatrix.
Now the following submatrices permute (see also 1.21) [formula], [formula], where a, b are arbitrary.
The second of these is thus the leading submatrix of [formula], whence in [formula] it is [formula], which is symmetric and contains just two arbitrary quantities; there are thus just n arbitrary quantities in all.
Extension to more complicated cases is not difficult.
However, these studies employ the modal and spectral matrices, and in practice these may not be known.
We can, however, find directly the general solution of [formula].
Write [formula], so that C contains n (n + 1) /2 unknown elements, n in the diagonal, viz. p, q, r,..., and n (n - 1) /2 elsewhere, viz. a, b, c....
Now the conditions that B is symmetric provide just n (n -1) /2 linear algebraic equations connecting the unknowns, so that we can determine a, b, c,... in terms of p, q, r... which are arbitrary.
The procedure is best illustrated by means of a numerical example: we choose A to be the matrix of (2.10.2.16).
Then [formula].
Then the condition that [formula], for example, leads to [formula], and there are five other such equations.
We set them out thus: [formula] These are then solved, e.g. by direct operation on rows, when a, b, c,... emerge as quantities linear in p, q, r, s.
To set out the result compactly, we write [formula], where [formula]... have zero diagonals except for a unit in the first, second,... place respectively.
In this example, we can write [formula], where to keep the elements numerically simple, we have extracted the factor 1/3 from [formula] and [formula].
This is evidently the most general form of C. Postmultiplication by A yields B in the form [formula].
When p, q, r, s are assigned numerically, C and B are known and then the factors [formula] and B of A are found.
We can recover the matrix C (and also B) defined in (2.10.1.13) by assigning to p, q, r, s the values 1, 1.52, 3, 0.92 appearing in the diagonal of C.
The following characteristics of [formula], [formula], etc. are to be noted.
First, since each contains zeros in its diagonal, it can not be pos. def.
Next, the ranks of the matrices do not conform to any simple pattern: [formula] is of rank 4, [formula] and [formula] of rank 3 and [formula] of rank 2.
Finally, in relation to our main problem, since for the Rayleigh quotient (which gives eigenvalues identical to those of A) [formula], and since also p, q, r, s are arbitrary, it follows that [formula], i = 1,2,3,4, though this result is not particularly helpful in practice.
2.10.4 Another method for improvement of eigenvalues
We discuss here the same problem as that in 2.10.2, but vary the treatment.
The eigenvalues are determined by Equation (2.10.2.5) with g = 0, viz. [formula], where  is the submatrix of A corresponding to [formula], and [formula] is the last diagonal element in A. From this we find [formula], [formula], and elimination of  leads to the equation for  [formula].
This equation is not easy to solve numerically as it stands; however, if we are given an approximation [formula] to an eigenvalue, we write [formula] for the exact eigenvalue, which therefore satisfies (4), so that we obtain [formula], which is now an equation for  Write [formula] then [formula].
Thus [formula], provided  is sufficiently small.
Insertion in (5) leads to [formula], or, if [formula], [formula], [formula], etc., [formula].
Finally, if [formula] is written [formula], we obtain the scalar equation [formula].
Provided the series converges, this may be solved for  in any suitable way; often a regression procedure, beginning with  = 0 on the right-hand side to give a new approximation on the left, will solve the equation in two or three steps.
Comparison of (8) with (3) shows that, when  has been found from (9) [formula], so that both eigenvalue and vector are found.
Example 1
Once again, we use the matrix specified in Equation (2.10.2.16) and we are given that it has an eigenvalue near [formula].
Then, in (6), with n=4, [formula].
Also, [formula]; [formula]; [formula];.
Hence, using (7), we obtain [formula], [formula], [formula], [formula], [formula], and it is in fact unnecessary to proceed further.
We now evaluate the scalars [formula]; they become [formula], [formula], [formula], [formula], [formula].
Hence Equation (9), for , becomes [formula], and if we collect the linear terms and rationalise: [formula] and solution of this, by regression or otherwise, gives [formula]; [formula].
Also, using (10) with [formula], we find [formula] or [formula].
